mckennalab / FlashFry

FlashFry: The rapid CRISPR target site characterization tool
Other
64 stars 10 forks source link

Discover module: Key not found: 0 error #26

Closed sri-jumpcode closed 2 years ago

sri-jumpcode commented 2 years ago

Hello @aaronmck I have been using FlashFry a lot and it's a very useful tool. I am looking to count sgRNAs on sequencing reads (10+ million) that I use as reference sequences for indexing. When running it in discover mode to count for guides (pre-defined FASTA with 20 bp target + NGG PAMs), I keep getting the error below for some files.

If I do not use the --positionOutput argument, the error does not happen. However, I need to be able to use the position information for further analysis.

Any chance you can help?

Thanks, Sridhar Ranganathan

12:16:10.094 [main] INFO r.t.OrderedBinTraversalFactory - With 465 guides, and allowing 0 mismatch(es), we're going to scan 443 target bins out of a total of 16384 12:16:10.094 [main] INFO modules.OffTargetDiscovery - scanning against the known targets from the genome with 465 guides 12:16:10.094 [main] INFO modules.OffTargetDiscovery - Starting seek traversal 12:16:10.259 [main] INFO reference.traverser.SeekTraverser$ - Comparing the 0th bin (AAAACTA) with 286 guides, of a total bin count 443. 0.082861372 seconds/10K bins, executed 7,618,846 comparisons 12:16:12.012 [main] INFO modules.OffTargetDiscovery - Performed a total of 92,748 guide to target comparisons 12:16:12.014 [main] INFO modules.OffTargetDiscovery - Writing final output for 465 guides Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (modules.OffTargetDiscovery@3b938003): java.util.NoSuchElementException: key not found: 0 at picocli.CommandLine.execute(CommandLine.java:1056) at picocli.CommandLine.access$900(CommandLine.java:142) at picocli.CommandLine$RunLast.handle(CommandLine.java:1255) at picocli.CommandLine$RunLast.handle(CommandLine.java:1223) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1131) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1414) at picocli.CommandLine.parseWithHandler(CommandLine.java:1353) at main.scala.Main$.main(Main.scala:57) at main.scala.Main.main(Main.scala) Caused by: java.util.NoSuchElementException: key not found: 0 at scala.collection.MapLike.default(MapLike.scala:232) at scala.collection.MapLike.default$(MapLike.scala:231) at scala.collection.AbstractMap.default(Map.scala:59) at scala.collection.mutable.HashMap.apply(HashMap.scala:65) at bitcoding.BitPosition.decode(BitPosition.scala:67) at crispr.CRISPRHit.$anonfun$toOutput$1(CRISPRHit.scala:59) at crispr.CRISPRHit.$anonfun$toOutput$1$adapted(CRISPRHit.scala:58) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:253) at scala.collection.TraversableLike.map(TraversableLike.scala:234) at scala.collection.TraversableLike.map$(TraversableLike.scala:227) at scala.collection.mutable.ArrayOps$ofLong.map(ArrayOps.scala:253) at crispr.CRISPRHit.toOutput(CRISPRHit.scala:58) at targetio.TabDelimitedOutput.$anonfun$write$7(TabDelimitedHandler.scala:151) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike.map(TraversableLike.scala:234) at scala.collection.TraversableLike.map$(TraversableLike.scala:227) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at targetio.TabDelimitedOutput.write(TabDelimitedHandler.scala:151) at modules.OffTargetDiscovery.$anonfun$run$4(OffTargetDiscovery.scala:135) at modules.OffTargetDiscovery.$anonfun$run$4$adapted(OffTargetDiscovery.scala:134) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:32) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:29) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:193) at modules.OffTargetDiscovery.run(OffTargetDiscovery.scala:134) at picocli.CommandLine.execute(CommandLine.java:1048) ... 8 more

aaronmck commented 2 years ago

Hi Sridhar,

Thanks for the kind words. So if I understand, your input fasta has 10+ million 'contigs'? I hadn't thought about this before, but we only alot 0xFFFFF space for contig names, which is 1,048,575 in base ten. I'll do some more testing, but I believe the error is being generated as we wrap this space back to zero, and you're hitting an off-target guide in the (wrapped) zero spot, which we've intentionally kept empty. I'll add some guards and warnings to the code about this.

sri-jumpcode commented 2 years ago

Yes, in this instance there are more than 10 million contigs, although I have used more before. I dug a little deeper and I discovered that only one query sequence is being problematic.

>NC_000021.9_8218207_8218226 AAAAGGTCAGAAGGATCGTGAGG

I had 400+ sequences in query fasta (these are all guide sequences that I would like to count in my sequencing data) and only this one proved troublesome.

aaronmck commented 2 years ago

To test this out I made a 1.13 release of FlashFry, if this is the issue it should error out if you've exceeded the maximum contig count for positional information. To support more than 1M contigs it'll take a bit of development, as we bit-pack our results into a 64-bit Long right now, and we've used up all that space.