Closed robmaz closed 5 years ago
By the way, apparently we can now have pools where singly and doubly-barcoded samples are mixed in the same pool. This would mean an empty barcode_sequence_2 column for some samples in the barcode file, and readtools would have to apply the one or the other matching strategy depending on the sample. I am not sure this can currently be handled?
Thanks for reporting @robmaz - here are two different issues:
--verbosity DEBUG
and attach/report the log? I will also try to figure out with the data that you provided ASAP.More information about the issue (with DEBUG level):
java.lang.NullPointerException
at org.magicdgs.readtools.tools.barcodes.dictionary.decoder.BarcodeMatch.getBestBarcodeMatch(BarcodeMatch.java:150)
at org.magicdgs.readtools.tools.barcodes.dictionary.decoder.BarcodeDecoder.lambda$getBestBarcodeString$4(BarcodeDecoder.java:237)
at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.magicdgs.readtools.tools.barcodes.dictionary.decoder.BarcodeDecoder.getBestBarcodeString(BarcodeDecoder.java:241)
at org.magicdgs.readtools.tools.barcodes.dictionary.decoder.BarcodeDecoder.assignReadGroupByBarcode(BarcodeDecoder.java:199)
at org.magicdgs.readtools.tools.barcodes.AssignReadGroupByBarcode.apply(AssignReadGroupByBarcode.java:175)
at org.magicdgs.readtools.engine.ReadToolsWalker.lambda$new$1(ReadToolsWalker.java:144)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.magicdgs.readtools.engine.ReadToolsWalker.traverse(ReadToolsWalker.java:128)
at org.magicdgs.readtools.engine.ReadToolsWalker.doWork(ReadToolsWalker.java:193)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:183)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:202)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.magicdgs.readtools.Main.main(Main.java:87)
@robmaz - I figure out that there is an errors in your input data and this is not a bug in the program itself: the barcode file has an extra tab between sample_name
and barcode_sequence
headers; thus, it is a bad formatted barcode file that produce the error (when removing, I was able to run the program normally). It is true that the error can be better, and I will open an issue to add a check for throwing a more informative exception.
I change this from bug to mainteinance, as it is related with a non-useful error message when a file is malformed.
Closing in favor of #512 and #513
Trying to demultiplex recently sequenced pools with two barcodes, I get an
UNEXPECTED ERROR: null
Command line:
readtools AssignReadGroupByBarcode --splitSample --barcodeFile Info/barcodes_452.txt --maximumMismatches 1 --output Demul-452 --input Pool_452a_4k_1.fq --input2 Pool_452a_4k_2.fq > demultiplex-452.out 2>&1
What do you make of that? I am attaching small fastqs that should reproduce the error. I only use the first barcode because there seems to be another problem with using two barcodes, which I will report separately. Using only one barcode seems to work ok with two other pools from the same batch, the difference here is maybe that according to the metainfo I received, not all samples are in fact doubly-barcoded.
viola-452.txt barcodes_452.txt Pool_452a_4k_1.fq.gz Pool_452a_4k_2.fq.gz demultiplex-452.out.txt