milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
326 stars 79 forks source link

NPE in align #79

Closed bbimber closed 8 years ago

bbimber commented 8 years ago

Hello,

I am running mixcr 1.7.1 using the command below. I downloaded the human library locally using importFromIMGT.sh. I get this NPE for multiple species downloaded locally:

mixcr align -f -r Clone-46.log.txt -l TRA -p rna-seq --library local -s hs ./Clone-46-FQ-R1.all.fastq ./Clone-46-FQ-R2.all.fastq Clone-46.hs.TRA.vdjca Alignment: 100% Exception in thread "main" java.lang.NullPointerException at com.milaboratory.mixcr.vdjaligners.VDJCAlignerSJFirst.align(VDJCAlignerSJFirst.java:124) at com.milaboratory.mixcr.vdjaligners.VDJCAlignerSJFirst.process(VDJCAlignerSJFirst.java:60) at com.milaboratory.mixcr.vdjaligners.VDJCAlignerWithMerge.process(VDJCAlignerWithMerge.java:78) at com.milaboratory.mixcr.vdjaligners.VDJCAlignerWithMerge.process(VDJCAlignerWithMerge.java:42) at cc.redberry.pipe.CUtils$9.process(CUtils.java:350) at cc.redberry.pipe.CUtils$9.process(CUtils.java:345) at cc.redberry.pipe.blocks.ParallelProcessor$Worker.run(ParallelProcessor.java:304) at java.lang.Thread.run(Thread.java:745)

I looked at the code a little and didnt see an obvious cause. Let me know if there's any information that would help.

dbolotin commented 8 years ago

This problem is caused by absence of VTranscript feature in IMGT data (namely there is no 5'UTR and Leader sequence). We've just added human readable message for this case (this will be released in 1.8).

Try to add the following option:

-OvParameters.geneFeatureToAlign=VRegion

to override VTranscript feature used by rna-seq set of parameters, and let me know if it fixed the problem.

Thanks for reporting this case! I will add some additional code to automatically fall back to VRegion in such cases.

Connected to #76

bbimber commented 8 years ago

Thanks for the quick reply. Just one question: why did I only see this w/ the local IMGT DB? What is different about selecting -s hs, but not supplying 'local'?

dbolotin commented 8 years ago

The gene library imported from IMGT is activated only if you supply --library local option.

-s hs is the default value for this option, so actually it does'n change anything.

Our internal gene library was built from genomic data, directly from NCBI assembly of corresponding loci (see here), so along with VRegion part of the sequence it also contains all genomic sequences like UTRs and introns. This information helps to align these regions in particular types of data.

For example all 5'RACE protocols contain 5'UTR sequences that can be found in the second Illumina read, and by aligning this sequences with UTRs of V genes from internal DB MiXCR achieves greater specificity of V gene assignment.

Does it answers your question?

bbimber commented 8 years ago

It does; however, I see a similar stack:

mixcr align -f -r Clone-27.log.txt -l TRA -p rna-seq --library local -OvParameters.geneFeatureToAlign=VRegion -s macaque ./Clone-27-FQ-R1.all.fastq ./Clone-27-FQ-R2.all.fastq Clone-27.macaque.TRA.vdjca Alignment: 100% Exception in thread "main" java.lang.NullPointerException at com.milaboratory.mixcr.vdjaligners.VDJCAlignerSJFirst.align(VDJCAlignerSJFirst.java:124) at com.milaboratory.mixcr.vdjaligners.VDJCAlignerSJFirst.process(VDJCAlignerSJFirst.java:60) at com.milaboratory.mixcr.vdjaligners.VDJCAlignerWithMerge.process(VDJCAlignerWithMerge.java:78) at com.milaboratory.mixcr.vdjaligners.VDJCAlignerWithMerge.process(VDJCAlignerWithMerge.java:42) at cc.redberry.pipe.CUtils$9.process(CUtils.java:350) at cc.redberry.pipe.CUtils$9.process(CUtils.java:345) at cc.redberry.pipe.blocks.ParallelProcessor$Worker.run(ParallelProcessor.java:304) at java.lang.Thread.run(Thread.java:745)

The IMGT macaque DB lacks data for TRA (this is running as part of a script looping through the species/loci). Does that account for this NPE?

bbimber commented 8 years ago

Note: the IMGT download tool gives a zero-line FASTA file for rhesus macaque TRA, just to be clear.

dbolotin commented 8 years ago

Yes it does. Unfortunately there are no TRAV genes for rhesus macaque in IMGT: http://www.imgt.org/vquest/refseqh.html

bbimber commented 8 years ago

Is there a way to make the tool behave more gracefully in this situation?

dbolotin commented 8 years ago

Yes, current development version already outputs human readable error message for this case.