While working on issue #11, mixcr began to throw an error while trying to maintain read IDs during the align and assembly process. We need to include the read ids in our pipeline so that we can map back to original reads to figure out what's happening to them in the align/assemble steps that's causing us to lose so many. Below is a copy of the issue that I created on the MiXCR github page
I'm running the mixcr align-assemble-export pipeline on 170 samples of mouse T-Cell Receptor beta data. Samples were created using a multiplex PCR reaction to amplify all 20 V and 13 J genes within this locus. Using the pipeline without the index step works just fine for all samples, it's when we attempt to add the -index [index file] option that we get the issue.
The error occurs for 8 of the 170 files, but which 8 vary between runs.
Examples
Trial 1
Run mixcr align (no errors) and mixcr assemble
A) Receive two errors from assemble (Samples 146 and 167)
Using same vdjca files, re-run mixcr assemble for samples 146 and 167
A) Did this 10 times for each sample
i) receive same error message 10 of 10 times for sample 146
ii) receive same error message 10 of 10 times for sample 167
Trial 2
Run mixcr align (no errors) and mixcr assemble
A) Receive four errors from assemble (Samples 1, 29, 86, 95, 145, 167)
Using the same vdjca files, re-run mixcr assemble for samples 1, 29, 86, 95, 145, 167
A) Did this 10 times for each sample - receive same error message 10 of 10 times for all 6
Trial 3
Run mixcr align (no errors) and mixcr assemble
A) Receive three errors from assemble (Samples 95, 146, 167)
Re-run mixcr align for error samples 10 times
Re-run mixcr assemble for the outputs of part 2.
A) For the 10 replicates of sample 95 - 5 produced error
B) For the 10 replicates of sample 146 - 3 produced error
C) For the 10 replicates of sample 167 - 7 produced error
Stack Trace of Error
Exception in thread "main" java.lang.NullPointerException at com.milaboratory.mixcr.assembler.IO.write0(IO.java:122) at com.milaboratory.mixcr.assembler.IO.access$000(IO.java:45) at com.milaboratory.mixcr.assembler.IO$ReadToCloneMappingBtreeSerializer.serialize(IO.java:81) at org.mapdb.BTreeMap$NodeSerializer.serialize(BTreeMap.java:385) at org.mapdb.BTreeMap$NodeSerializer.serialize(BTreeMap.java:288) at org.mapdb.Store.serialize(Store.java:154) at org.mapdb.StoreDirect.put(StoreDirect.java:365) at org.mapdb.Caches$HashTable.put(Caches.java:216) at org.mapdb.Pump.buildTreeMap(Pump.java:470) at org.mapdb.DB.createTreeSet(DB.java:1072) at org.mapdb.DB$BTreeSetMaker.make(DB.java:749) at com.milaboratory.mixcr.cli.ActionAssemble.go(ActionAssemble.java:125) at com.milaboratory.mitools.cli.JCommanderBasedMain.main(JCommanderBasedMain.java:145) at com.milaboratory.mixcr.cli.Main.main(Main.java:64)
Interpretation
To me it looks like there is an issue with saving reads in the align step, but one that does not trigger any error. It only shows when the assemble step is run and tries map the read IDs into the index file.
Next Steps
Going to take one of the problem files (167) and divide the fastq file up into 10 segments of relatively equal size and perform the same trials as above.
Overview
While working on issue #11, mixcr began to throw an error while trying to maintain read IDs during the align and assembly process. We need to include the read ids in our pipeline so that we can map back to original reads to figure out what's happening to them in the align/assemble steps that's causing us to lose so many. Below is a copy of the issue that I created on the MiXCR github page
System
java version 1.7.0_101 mixcr version 1.7.1
Commands
usr/bin/java -Xmx15g -jar /path/to/mixcr-1.7.1/mixcr.jar align -f --loci TRB --species mmu --save-description --save-reads --report /path/to/report.txt /path/to/fastq /path/to/align.vdjca
usr/bin/java -Xmx15g -jar /path/to/mixcr-1.7.1/mixcr.jar assemble -f --index /path/to/index.txt --report /path/to/report.txt --threads 4 /path/to/align.vdjca /path/to/assemble.clns
Description
I'm running the mixcr align-assemble-export pipeline on 170 samples of mouse T-Cell Receptor beta data. Samples were created using a multiplex PCR reaction to amplify all 20 V and 13 J genes within this locus. Using the pipeline without the index step works just fine for all samples, it's when we attempt to add the
-index [index file]
option that we get the issue.The error occurs for 8 of the 170 files, but which 8 vary between runs.
Examples
Trial 1
Run mixcr align (no errors) and mixcr assemble
A) Receive two errors from assemble (Samples 146 and 167)
Using same vdjca files, re-run mixcr assemble for samples 146 and 167
A) Did this 10 times for each sample
Trial 2
Run mixcr align (no errors) and mixcr assemble
A) Receive four errors from assemble (Samples 1, 29, 86, 95, 145, 167)
Using the same vdjca files, re-run mixcr assemble for samples 1, 29, 86, 95, 145, 167
A) Did this 10 times for each sample - receive same error message 10 of 10 times for all 6
Trial 3
Re-run mixcr assemble for the outputs of part 2.
A) For the 10 replicates of sample 95 - 5 produced error B) For the 10 replicates of sample 146 - 3 produced error C) For the 10 replicates of sample 167 - 7 produced error
Stack Trace of Error
Exception in thread "main" java.lang.NullPointerException at com.milaboratory.mixcr.assembler.IO.write0(IO.java:122) at com.milaboratory.mixcr.assembler.IO.access$000(IO.java:45) at com.milaboratory.mixcr.assembler.IO$ReadToCloneMappingBtreeSerializer.serialize(IO.java:81) at org.mapdb.BTreeMap$NodeSerializer.serialize(BTreeMap.java:385) at org.mapdb.BTreeMap$NodeSerializer.serialize(BTreeMap.java:288) at org.mapdb.Store.serialize(Store.java:154) at org.mapdb.StoreDirect.put(StoreDirect.java:365) at org.mapdb.Caches$HashTable.put(Caches.java:216) at org.mapdb.Pump.buildTreeMap(Pump.java:470) at org.mapdb.DB.createTreeSet(DB.java:1072) at org.mapdb.DB$BTreeSetMaker.make(DB.java:749) at com.milaboratory.mixcr.cli.ActionAssemble.go(ActionAssemble.java:125) at com.milaboratory.mitools.cli.JCommanderBasedMain.main(JCommanderBasedMain.java:145) at com.milaboratory.mixcr.cli.Main.main(Main.java:64)
Interpretation
To me it looks like there is an issue with saving reads in the align step, but one that does not trigger any error. It only shows when the assemble step is run and tries map the read IDs into the index file.
Next Steps
Going to take one of the problem files (167) and divide the fastq file up into 10 segments of relatively equal size and perform the same trials as above.