milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
323 stars 78 forks source link

iRepertoire fastq files #570

Closed guillemsanchezsanchez1996 closed 4 years ago

guillemsanchezsanchez1996 commented 4 years ago

Good Morning Everyone, We have received data from a collaborator who needs help to analyze the TCR repertoire of some samples. This data comes from iRepertoire. The files where the reads are have ".fq" format. I've checked them and seem okey. Once i process them the alignment seems working well! But the assemble step is a total failure due high value of "Average number of reads per clonotype"--> about 150-200 reads. Previous data (Same cells) that I've used that wasn't generated by iRepertoire had similar number of total reads but the value for "Average number of reads per clonotype" was about 5-10 reads.... Does anybody know what could be the problem in this case? Has anyone ever used mixcr to process samples derived from iRepertoire?

Kind Regards,

Guillem

dbolotin commented 4 years ago

Please specify in some more details where the problem exactly is.

guillemsanchezsanchez1996 commented 4 years ago

Dear Dmitry, This is a report of assemble; java -Xmx1G -jar mixcr.jar assemble -ObadQualityThreshold=10 C:\Users\guill\Desktop\MixCrfolder\alignments1.vdjca clones3.clns Initialization: progress unknown Assembling initial clonotypes: 96,7% Writing clones: 0% ============= Report ============== Analysis time: 3,00s Final clonotype count: 408 Average number of reads per clonotype: 120,25 Reads used in clonotypes, percent of total: 49060 (85,65%) Reads used in clonotypes before clustering, percent of total: 54472 (95,1%) Number of reads used as a core, percent of used: 54472 (100%) Mapped low quality reads, percent of used: 0 (0%) Reads clustered in PCR error correction, percent of used: 5412 (9,94%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Reads dropped due to the lack of a clone sequence, percent of total: 2362 (4,12%) Reads dropped due to low quality, percent of total: 0 (0%) Reads dropped due to failed mapping, percent of total: 0 (0%) Reads dropped with low quality clones, percent of total: 0 (0%) Clonotypes eliminated by PCR error correction: 1647 Clonotypes dropped as low quality: 0 Clonotypes pre-clustered due to the similar VJC-lists: 0 TRA chains: 407 (99,75%) TRD chains: 1 (0,25%)

As compared to other analyses I've made using other data (not from iRepertoire) generated from the same type of cells, the Average number of reads per clonotype is really high...maybe its a stupid question but I don't know why this is happening. If you want I can attach the "fq" file to see if the fastq data derived from iRepertoire is ok..but after a quick look at it seems to be normal.

Thanks for your help

dbolotin commented 4 years ago

Average number of reads per clonotype is something distantly similar to coverage in genome sequencing. So, it is rather a characteristic of a library sequencing procedure, then of the library itself, or original DNA/RNA material. More reads you read the greater the value will be.

All other report values also seems normal.

guillemsanchezsanchez1996 commented 4 years ago

Dear Dmitry, Seems that the number of cells that our collaborator sorted are (very) low...but i didn't know it. SO i think this solves the problem. Thank you very much for your fast answer, Regards,

Guillem

dbolotin commented 4 years ago

If the number of cells was low, this is indeed the reason for this. Cheers.