mckennalab / FlashFry

FlashFry: The rapid CRISPR target site characterization tool
Other
63 stars 10 forks source link

issue parsing T2T-CHM13v2.0 during indexing #37

Closed tdfair closed 12 months ago

tdfair commented 12 months ago

Dear Aaron,

I'm trying to index T2T-CHM13v2.0 (https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz) with FlashFry:

java -Xmx10g -jar FlashFry-assembly-1.15.jar index \ --tmpLocation ./tmp \ --enzyme spcas9ngg19 \ --reference /2TBevo_new/T2T-CHM13v2.0.fa.gz \ --database T2TCHM13v2.0_spcas9ngg19

Which is throwing the following error:

14:22:19.135 [main] INFO modules.BuildOffTargetDatabase - Discovering target sites in the input genome file... 14:22:19.147 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr1 CP068277.2 Homo sapiens isolate CHM13 chromosome 1 14:22:21.145 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr2 CP068276.2 Homo sapiens isolate CHM13 chromosome 2 14:23:35.648 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr3 CP068275.2 Homo sapiens isolate CHM13 chromosome 3 14:24:39.961 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr4 CP068274.2 Homo sapiens isolate CHM13 chromosome 4 14:25:32.330 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr5 CP068273.2 Homo sapiens isolate CHM13 chromosome 5 14:26:20.209 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr6 CP068272.2 Homo sapiens isolate CHM13 chromosome 6 14:27:07.387 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr7 CP068271.2 Homo sapiens isolate CHM13 chromosome 7 14:27:52.435 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr8 CP068270.2 Homo sapiens isolate CHM13 chromosome 8 14:28:35.155 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr9 CP068269.2 Homo sapiens isolate CHM13 chromosome 9 14:29:13.785 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr10 CP068268.2 Homo sapiens isolate CHM13 chromosome 10 14:29:56.145 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr11 CP068267.2 Homo sapiens isolate CHM13 chromosome 11 14:30:32.931 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr12 CP068266.2 Homo sapiens isolate CHM13 chromosome 12 14:31:10.173 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr13 CP068265.2 Homo sapiens isolate CHM13 chromosome 13 14:31:45.976 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr14 CP068264.2 Homo sapiens isolate CHM13 chromosome 14 14:32:14.238 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr15 CP068263.2 Homo sapiens isolate CHM13 chromosome 15 14:32:41.599 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr16 CP068262.2 Homo sapiens isolate CHM13 chromosome 16 14:33:09.938 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr17 CP068261.2 Homo sapiens isolate CHM13 chromosome 17 14:33:37.751 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr18 CP068260.2 Homo sapiens isolate CHM13 chromosome 18 14:34:03.194 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr19 CP068259.2 Homo sapiens isolate CHM13 chromosome 19 14:34:23.557 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr20 CP068258.2 Homo sapiens isolate CHM13 chromosome 20 14:34:43.140 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr21 CP068257.2 Homo sapiens isolate CHM13 chromosome 21 14:35:01.995 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chr22 CP068256.2 Homo sapiens isolate CHM13 chromosome 22 14:35:14.299 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chrX CP068255.2 Homo sapiens isolate CHM13 chromosome X 14:35:30.716 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chrY CP086569.2 Homo sapiens isolate NA24385 chromosome Y 14:36:09.753 [main] INFO reference.ReferenceEncoder$ - Switching to chromosome >chrM CP068254.1 Homo sapiens isolate CHM13 mitochondrion, complete genome 14:36:23.651 [main] INFO reference.ReferenceEncoder$ - Done looking for targets... 14:36:23.651 [main] INFO modules.BuildOffTargetDatabase - Closing the temporary binary output files... 14:36:24.501 [main] INFO modules.BuildOffTargetDatabase - Creating the final binary database file... 14:38:40.409 [main] INFO reference.binary.DatabaseWriter$ - Writing bin AATTGCT our 999 bin 14:39:53.967 [main] INFO reference.binary.DatabaseWriter$ - Writing bin ACTTATT our 1999 bin 14:41:59.520 [main] INFO reference.binary.DatabaseWriter$ - Writing bin AGTGTCT our 2999 bin 14:43:40.693 [main] INFO reference.binary.DatabaseWriter$ - Writing bin ATTGCTT our 3999 bin 14:45:55.422 [main] INFO reference.binary.DatabaseWriter$ - Writing bin CATGACT our 4999 bin 14:48:31.185 [main] INFO reference.binary.DatabaseWriter$ - Writing bin CCTCGTT our 5999 bin Failed on processing file: /home/tyler/Documents/FlashFry/./tmp/binCCTTGG9469158892438854627.txt Exception in thread "main" picocli.CommandLine$ExecutionException: Error while running command (modules.BuildOffTargetDatabase@451001e5): java.lang.IllegalStateException: Unable to parse line: chr6_CP068272.2_Homo_sapiens_isolate_CHM13_chromosome_6 6<877372 68877394 CCTTGGCCTCCCAAAGTGCTGG R at picocli.CommandLine.execute(CommandLine.java:1056) at picocli.CommandLine.access$900(CommandLine.java:142) at picocli.CommandLine$RunLast.handle(CommandLine.java:1255) at picocli.CommandLine$RunLast.handle(CommandLine.java:1223) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1131) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1414) at picocli.CommandLine.parseWithHandler(CommandLine.java:1353) at main.scala.Main$.main(Main.scala:57) at main.scala.Main.main(Main.scala) Caused by: java.lang.IllegalStateException: Unable to parse line: chr6_CP068272.2_Homo_sapiens_isolate_CHM13_chromosome_6 6<877372 68877394 CCTTGGCCTCCCAAAGTGCTGG R at crispr.CRISPRSite$.fromLine(CRISPRSite.scala:76) at reference.binary.BlockReader.$anonfun$loadBlock$1(BlockReader.scala:94) at scala.collection.Iterator.foreach(Iterator.scala:929) at scala.collection.Iterator.foreach$(Iterator.scala:929) at scala.collection.AbstractIterator.foreach(Iterator.scala:1406) at reference.binary.BlockReader.loadBlock(BlockReader.scala:93) at reference.binary.BlockReader.fetchBin(BlockReader.scala:62) at reference.binary.DatabaseWriter$.$anonfun$writeToBinnedFileSet$1(DatabaseWriter.scala:82) at reference.binary.DatabaseWriter$.$anonfun$writeToBinnedFileSet$1$adapted(DatabaseWriter.scala:78) at scala.collection.Iterator.foreach(Iterator.scala:929) at scala.collection.Iterator.foreach$(Iterator.scala:929) at scala.collection.AbstractIterator.foreach(Iterator.scala:1406) at reference.binary.DatabaseWriter$.writeToBinnedFileSet(DatabaseWriter.scala:78) at modules.BuildOffTargetDatabase.run(BuildOffTargetDatabase.scala:82) at picocli.CommandLine.execute(CommandLine.java:1048) ... 8 more

Not quite sure what's causing the parsing error, the fai index looks like this:

chr1 248387328 57 80 81 chr2 242696752 251492284 80 81 chr3 201105948 497222803 80 81 chr4 193574945 700842633 80 81 chr5 182045439 896837322 80 81 chr6 172126628 1081158386 80 81 chr7 160567428 1255436654 80 81 chr8 146259331 1418011232 80 81 chr9 150617247 1566098862 80 81 chr10 134758134 1718598884 80 81 chr11 135127769 1855041554 80 81 chr12 133324548 1991858480 80 81 chr13 113566686 2126849644 80 81 chr14 101161492 2241835973 80 81 chr15 99753195 2344262043 80 81 chr16 96330374 2445262212 80 81 chr17 84276897 2542796775 80 81 chr18 80542538 2628127193 80 81 chr19 61707364 2709676572 80 81 chr20 66210255 2772155338 80 81 chr21 45090682 2839193281 80 81 chr22 51324926 2884847656 80 81 chrX 154259566 2936814201 80 81 chrY 62460029 3093002071 80 81 chrM 16569 3156242926 80 81

Many thanks for your help,

Tyler

tdfair commented 12 months ago

resolved by shortening chromosome names