Unexpected output of mixcr program (v1.3). Fixed in v1.4?

BIGGIGREP commented 9 years ago

I am testing the MixCR program (v1.3) and I have found an unusual situation when running 'exportAlignments'. The problem I have noticed is that the order in which sequences are provided in a FASTA or FASTQ file will affect the number of successful sequences that are aligned.

In the example(s) I provide below, I made a FASTA file containing 7 total sequences. There are only 4 unique NGS reads in the FASTA file; that is I repeated one sequences 3x and a second sequence 2x. The remaining two sequences should not return strong hits.

if I run mixcr using the 7 test sequences (test1.fasta), then the Mixcr log file says that 2/7 sequences (rather than 5/7) returned results. This is problematic in that not all 5 are found, BUT even more problematic is if I simply change the order of the sequences in the file (test2.fasta) then the Mixcr log file says 4/7 (rather than 5/7) returned results.

The fact that I do not see 5/7 sequences successfully returned seems to be a bug. Also, I would not expect the output of exportalignments to be sensitive to the order of the sequences in a file. Is this true? If so, is it a known problem?

If its not a bug, then how can I run the settings so that I get all 5 successful sequences returned when using 'exportalignments'?

dbolotin commented 9 years ago

Dear Constantine,

Sorry, I thought that Dmitry forwarded my answer to you.

This random effect comes from aligner which randomly drops K-mer seeds onto target sequence. Our recent benchmarks also showed that aligner parameters for the J gene are not optimal for highly hypermutated IG sequences (which is the case in your data). We plan to optimize all this things in the 1.6~1.7 release. For now, you can download the latest MiXCR version (v1.4) and use denser seeds for J aligner to make results more stable. Actually, fixes in this release were inspired by your question. Here is the command line example that fixes the issue with MiXCR v1.4:

mixcr align -OjParameters.parameters.mapperMaxSeedsDistance=5 input.fasta output.vdjca

I'll leave this issue opened till we optimise J aligner.

Thanks for reporting!

BIGGIGREP commented 9 years ago

Great, I will test this out. Thank you very much.

On Tue, Aug 25, 2015 at 9:52 AM, dbolotin notifications@github.com wrote:

Dear Constantine,

Sorry, I thought that Dmitry forwarded my answer to you.

This random effect comes from aligner which randomly drops K-mer seeds onto target sequence. Our recent benchmarks also showed that aligner parameters for the J gene are not optimal for highly hypermutated IG sequences (which is the case in your data). We plan to optimize all this things in the 1.6~1.7 release. For now, you can download the latest MiXCR version (v1.4) and use denser seeds for J aligner to make results more stable. Actually, fixes in this release were inspired by your question. Here is the command line example that fixes the issue with MiXCR v1.4:

mixcr align -OjParameters.parameters.mapperMaxSeedsDistance=5 input.fasta output.vdjca

I'll leave this issue opened till we optimise J aligner.

Thanks for reporting!

— Reply to this email directly or view it on GitHub https://github.com/milaboratory/mixcr/issues/3#issuecomment-134611600.

dbolotin commented 8 years ago

This issue is connected with #28 .

dbolotin commented 7 years ago

Fixed circa 1.8 release.

milaboratory / mixcr

Unexpected output of mixcr program (v1.3). Fixed in v1.4? #3