magicDGS / ReadTools

A Universal Toolkit for Handling Sequence Data from Different Sequencing Platforms
https://magicdgs.github.io/ReadTools/
MIT License
6 stars 3 forks source link

AssignReadGroupByBarcode: problem with doubly-barcoded reads #509

Closed robmaz closed 5 years ago

robmaz commented 5 years ago

Trying to demultiplex recently sequenced pools with two barcodes, I get an


UNEXPECTED ERROR: null


Command line:

readtools AssignReadGroupByBarcode --splitSample --barcodeFile Info/barcodes_452.txt --maximumMismatches 1 --output Demul-452 --input Pool_452a_4k_1.fq --input2 Pool_452a_4k_2.fq > demultiplex-452.out 2>&1

What do you make of that? I am attaching small fastqs that should reproduce the error. I only use the first barcode because there seems to be another problem with using two barcodes, which I will report separately. Using only one barcode seems to work ok with two other pools from the same batch, the difference here is maybe that according to the metainfo I received, not all samples are in fact doubly-barcoded.

viola-452.txt barcodes_452.txt Pool_452a_4k_1.fq.gz Pool_452a_4k_2.fq.gz demultiplex-452.out.txt

robmaz commented 5 years ago

By the way, apparently we can now have pools where singly and doubly-barcoded samples are mixed in the same pool. This would mean an empty barcode_sequence_2 column for some samples in the barcode file, and readtools would have to apply the one or the other matching strategy depending on the sample. I am not sure this can currently be handled?

magicDGS commented 5 years ago

Thanks for reporting @robmaz - here are two different issues:

magicDGS commented 5 years ago

More information about the issue (with DEBUG level):

java.lang.NullPointerException
        at org.magicdgs.readtools.tools.barcodes.dictionary.decoder.BarcodeMatch.getBestBarcodeMatch(BarcodeMatch.java:150)
        at org.magicdgs.readtools.tools.barcodes.dictionary.decoder.BarcodeDecoder.lambda$getBestBarcodeString$4(BarcodeDecoder.java:237)
        at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
        at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
        at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
        at org.magicdgs.readtools.tools.barcodes.dictionary.decoder.BarcodeDecoder.getBestBarcodeString(BarcodeDecoder.java:241)
        at org.magicdgs.readtools.tools.barcodes.dictionary.decoder.BarcodeDecoder.assignReadGroupByBarcode(BarcodeDecoder.java:199)
        at org.magicdgs.readtools.tools.barcodes.AssignReadGroupByBarcode.apply(AssignReadGroupByBarcode.java:175)
        at org.magicdgs.readtools.engine.ReadToolsWalker.lambda$new$1(ReadToolsWalker.java:144)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
        at org.magicdgs.readtools.engine.ReadToolsWalker.traverse(ReadToolsWalker.java:128)
        at org.magicdgs.readtools.engine.ReadToolsWalker.doWork(ReadToolsWalker.java:193)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:183)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:202)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.magicdgs.readtools.Main.main(Main.java:87)
magicDGS commented 5 years ago

@robmaz - I figure out that there is an errors in your input data and this is not a bug in the program itself: the barcode file has an extra tab between sample_name and barcode_sequence headers; thus, it is a bad formatted barcode file that produce the error (when removing, I was able to run the program normally). It is true that the error can be better, and I will open an issue to add a check for throwing a more informative exception.

magicDGS commented 5 years ago

I change this from bug to mainteinance, as it is related with a non-useful error message when a file is malformed.

magicDGS commented 5 years ago

Closing in favor of #512 and #513