liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
275 stars 47 forks source link

How to get indicators similar to 10X reports? #296

Open Liripo opened 2 months ago

Liripo commented 2 months ago

image How to get indicators similar to 10X reports?for example: Reads Mapped to TRB

mourisl commented 2 months ago

We don't have a comprehensive QC report like this for now. It's on our TODO list.

lishuangshuang0616 commented 2 months ago

Reads Mapped to Any V(D)j Gene is the ratio of reads obtained by fastq-extractor to the total reads approximately equal?

Is "Reads Mapped to TRA" approximately equal to the sum of the _averagecoverage values of different immune types in annot.fa, divided by the total sum of average_coverage values, and then multiplied by "Reads Mapped to Any V(D)J Gene"? @mourisl

mourisl commented 2 months ago

fastq-extractor is a bit aggressive, so it will overestimate the reads mapped to VDJ gene regions. This number includes reads mapped to the C gene. The read used in the assembly might be a more accurate estimation of reads mapped to VDJ genes.

The averagecoverage is the sum(read_length) / 500, so you need the read length information, to convert that into read count.

yuyuleung commented 1 week ago

I am wondering how to accurately estimate the number of reads used to assemble each contig, especially when the input reads have varying lengths.

Thanks.

mourisl commented 1 week ago

There is no directly way to accurately estimate the number. One can be through the consensus weight matrix for each contig in the raw.out or final.out file, maybe the deepest coverage point can infer the number of reads. The other is from the "--outputReadAssignment" option, but the assignment is not necessarily corresponds to the assignment in the assembly step.