Closed sri-jumpcode closed 2 years ago
Hi Sridhar,
Thanks for the kind words. So if I understand, your input fasta has 10+ million 'contigs'? I hadn't thought about this before, but we only alot 0xFFFFF space for contig names, which is 1,048,575 in base ten. I'll do some more testing, but I believe the error is being generated as we wrap this space back to zero, and you're hitting an off-target guide in the (wrapped) zero spot, which we've intentionally kept empty. I'll add some guards and warnings to the code about this.
Yes, in this instance there are more than 10 million contigs, although I have used more before. I dug a little deeper and I discovered that only one query sequence is being problematic.
>NC_000021.9_8218207_8218226 AAAAGGTCAGAAGGATCGTGAGG
I had 400+ sequences in query fasta (these are all guide sequences that I would like to count in my sequencing data) and only this one proved troublesome.
To test this out I made a 1.13 release of FlashFry, if this is the issue it should error out if you've exceeded the maximum contig count for positional information. To support more than 1M contigs it'll take a bit of development, as we bit-pack our results into a 64-bit Long right now, and we've used up all that space.
Hello @aaronmck I have been using FlashFry a lot and it's a very useful tool. I am looking to count sgRNAs on sequencing reads (10+ million) that I use as reference sequences for indexing. When running it in discover mode to count for guides (pre-defined FASTA with 20 bp target + NGG PAMs), I keep getting the error below for some files.
If I do not use the
--positionOutput
argument, the error does not happen. However, I need to be able to use the position information for further analysis.Any chance you can help?
Thanks, Sridhar Ranganathan
12:16:10.094 [main] INFO r.t.OrderedBinTraversalFactory - With 465 guides, and allowing 0 mismatch(es), we're going to scan 443 target bins out of a total of 16384 12:16:10.094 [main] INFO modules.OffTargetDiscovery - scanning against the known targets from the genome with 465 guides 12:16:10.094 [main] INFO modules.OffTargetDiscovery - Starting seek traversal 12:16:10.259 [main] INFO reference.traverser.SeekTraverser$ - Comparing the 0th bin (AAAACTA) with 286 guides, of a total bin count 443. 0.082861372 seconds/10K bins, executed 7,618,846 comparisons 12:16:12.012 [main] INFO modules.OffTargetDiscovery - Performed a total of 92,748 guide to target comparisons 12:16:12.014 [main] INFO modules.OffTargetDiscovery - Writing final output for 465 guides Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (modules.OffTargetDiscovery@3b938003): java.util.NoSuchElementException: key not found: 0 at picocli.CommandLine.execute(CommandLine.java:1056) at picocli.CommandLine.access$900(CommandLine.java:142) at picocli.CommandLine$RunLast.handle(CommandLine.java:1255) at picocli.CommandLine$RunLast.handle(CommandLine.java:1223) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1131) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1414) at picocli.CommandLine.parseWithHandler(CommandLine.java:1353) at main.scala.Main$.main(Main.scala:57) at main.scala.Main.main(Main.scala) Caused by: java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike.default(MapLike.scala:232) at scala.collection.MapLike.default$(MapLike.scala:231) at scala.collection.AbstractMap.default(Map.scala:59) at scala.collection.mutable.HashMap.apply(HashMap.scala:65) at bitcoding.BitPosition.decode(BitPosition.scala:67) at crispr.CRISPRHit.$anonfun$toOutput$1(CRISPRHit.scala:59) at crispr.CRISPRHit.$anonfun$toOutput$1$adapted(CRISPRHit.scala:58) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:253) at scala.collection.TraversableLike.map(TraversableLike.scala:234) at scala.collection.TraversableLike.map$(TraversableLike.scala:227) at scala.collection.mutable.ArrayOps$ofLong.map(ArrayOps.scala:253) at crispr.CRISPRHit.toOutput(CRISPRHit.scala:58) at targetio.TabDelimitedOutput.$anonfun$write$7(TabDelimitedHandler.scala:151) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike.map(TraversableLike.scala:234) at scala.collection.TraversableLike.map$(TraversableLike.scala:227) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at targetio.TabDelimitedOutput.write(TabDelimitedHandler.scala:151) at modules.OffTargetDiscovery.$anonfun$run$4(OffTargetDiscovery.scala:135) at modules.OffTargetDiscovery.$anonfun$run$4$adapted(OffTargetDiscovery.scala:134) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:193) at modules.OffTargetDiscovery.run(OffTargetDiscovery.scala:134) at picocli.CommandLine.execute(CommandLine.java:1048) ... 8 more