svm-zhang / AGOUTI

Annotated Genome Optimization Using Transcriptome Information
MIT License
20 stars 8 forks source link

Error while Denoising joining pairs #16

Closed Arkhaan closed 5 years ago

Arkhaan commented 5 years ago

Hi, AGOUTI crashed during the step of denoising joining pair, but I'm not sure why. The bam was previously sorted by read names. Could it be the sequence name?

The logfile : 2018-07-28 05:17:09,247 - INFO - PARSE_ARGS PROGRESS - Assembly: contigs_shredded.ctg.fasta 2018-07-28 05:17:09,248 - INFO - PARSE_ARGS PROGRESS - Gene Model: contigs_shredded.ctg.gff 2018-07-28 05:17:09,250 - INFO - PARSE_ARGS PROGRESS - Original scaffold path: contigs_shredded.shred.info.txt 2018-07-28 05:17:09,251 - INFO - PARSE_ARGS PROGRESS - Output directory: output_agouti 2018-07-28 05:17:09,252 - INFO - PARSE_ARGS PROGRESS - Output prefix: agouti 2018-07-28 05:17:09,254 - INFO - PARSE_ARGS PROGRESS - Minimum number of supports: 5 2018-07-28 05:17:09,255 - INFO - PARSE_ARGS PROGRESS - Length of gaps to fill between contigs: 1000 2018-07-28 05:17:09,258 - INFO - AGOUTI_SEQUENCE PROGRESS - [BEGIN] Reading the initial assembly 2018-07-28 05:17:11,531 - INFO - AGOUTI_SEQUENCE PROGRESS - 27579 sequences parsed 2018-07-28 05:17:11,533 - INFO - AGOUTI_SEQUENCE PROGRESS - The given assembly N50: 377534 2018-07-28 05:17:11,535 - INFO - AGOUTI_SEQUENCE PROGRESS - [DONE] 2018-07-28 05:17:11,540 - INFO - AGOUTI_GFF PROGRESS - [BEGIN] Getting gene models 2018-07-28 05:17:13,498 - INFO - AGOUTI_GFF PROGRESS - 37255 Gene Models parsed 2018-07-28 05:17:13,501 - INFO - AGOUTI_GFF PROGRESS - [DONE] 2018-07-28 05:17:13,505 - INFO - AGOUTI_SAM PROGRESS - [BEGIN] Identifying joining pairs 2018-07-28 05:17:13,506 - INFO - AGOUTI_SAM PROGRESS - check SAMtools 2018-07-28 05:17:13,646 - INFO - AGOUTI_SAM PROGRESS - run SAMtools 2018-07-28 05:17:13,649 - INFO - AGOUTI_SAM PROGRESS - # processed | Current Reads ID | Elapsed Time 2018-07-28 05:17:44,620 - INFO - AGOUTI_SAM PROGRESS - 5000000 parsed | NS500354:60:H5LVTAFXX:1:11201:13586:20231 | 0.52 m 2018-07-28 05:18:15,661 - INFO - AGOUTI_SAM PROGRESS - 10000000 parsed | NS500354:60:H5LVTAFXX:1:11301:23077:15213 | 1.03 m 2018-07-28 05:18:46,502 - INFO - AGOUTI_SAM PROGRESS - 15000000 parsed | NS500354:60:H5LVTAFXX:1:21102:15462:14877 | 1.55 m 2018-07-28 05:19:17,094 - INFO - AGOUTI_SAM PROGRESS - 20000000 parsed | NS500354:60:H5LVTAFXX:1:21203:19422:17680 | 2.06 m 2018-07-28 05:19:47,536 - INFO - AGOUTI_SAM PROGRESS - 25000000 parsed | NS500354:60:H5LVTAFXX:1:21304:17803:5644 | 2.56 m 2018-07-28 05:20:17,641 - INFO - AGOUTI_SAM PROGRESS - 30000000 parsed | NS500354:60:H5LVTAFXX:2:11105:25375:13646 | 3.07 m 2018-07-28 05:20:48,279 - INFO - AGOUTI_SAM PROGRESS - 35000000 parsed | NS500354:60:H5LVTAFXX:2:11207:8091:2929 | 3.58 m 2018-07-28 05:21:18,817 - INFO - AGOUTI_SAM PROGRESS - 40000000 parsed | NS500354:60:H5LVTAFXX:2:11308:4976:7213 | 4.09 m 2018-07-28 05:21:49,529 - INFO - AGOUTI_SAM PROGRESS - 45000000 parsed | NS500354:60:H5LVTAFXX:2:21109:16378:11283 | 4.60 m 2018-07-28 05:22:19,988 - INFO - AGOUTI_SAM PROGRESS - 50000000 parsed | NS500354:60:H5LVTAFXX:2:21211:9495:10399 | 5.11 m 2018-07-28 05:22:50,679 - INFO - AGOUTI_SAM PROGRESS - 55000000 parsed | NS500354:60:H5LVTAFXX:2:21312:13850:5839 | 5.62 m 2018-07-28 05:23:21,094 - INFO - AGOUTI_SAM PROGRESS - 60000000 parsed | NS500354:60:H5LVTAFXX:3:11412:25133:6741 | 6.12 m 2018-07-28 05:23:51,383 - INFO - AGOUTI_SAM PROGRESS - 65000000 parsed | NS500354:60:H5LVTAFXX:3:11601:3762:13735 | 6.63 m 2018-07-28 05:24:21,813 - INFO - AGOUTI_SAM PROGRESS - 70000000 parsed | NS500354:60:H5LVTAFXX:3:21401:7705:13990 | 7.14 m 2018-07-28 05:24:52,034 - INFO - AGOUTI_SAM PROGRESS - 75000000 parsed | NS500354:60:H5LVTAFXX:3:21502:3659:2356 | 7.64 m 2018-07-28 05:25:22,355 - INFO - AGOUTI_SAM PROGRESS - 80000000 parsed | NS500354:60:H5LVTAFXX:3:21602:23325:9418 | 8.15 m 2018-07-28 05:25:52,727 - INFO - AGOUTI_SAM PROGRESS - 85000000 parsed | NS500354:60:H5LVTAFXX:4:11403:20334:3989 | 8.65 m 2018-07-28 05:26:23,699 - INFO - AGOUTI_SAM PROGRESS - 90000000 parsed | NS500354:60:H5LVTAFXX:4:11504:10857:10911 | 9.17 m 2018-07-28 05:26:54,719 - INFO - AGOUTI_SAM PROGRESS - 95000000 parsed | NS500354:60:H5LVTAFXX:4:11605:16190:14314 | 9.68 m 2018-07-28 05:27:25,728 - INFO - AGOUTI_SAM PROGRESS - 100000000 parsed | NS500354:60:H5LVTAFXX:4:21406:24721:18934 | 10.20 m 2018-07-28 05:27:56,861 - INFO - AGOUTI_SAM PROGRESS - 105000000 parsed | NS500354:60:H5LVTAFXX:4:21508:5960:7730 | 10.72 m 2018-07-28 05:28:27,411 - INFO - AGOUTI_SAM PROGRESS - 110000000 parsed | NS500354:60:H5LVTAFXX:4:21610:4286:8628 | 11.23 m 2018-07-28 05:28:33,550 - INFO - AGOUTI_SAM PROGRESS - 111016745 reads pairs in the give BAM 2018-07-28 05:28:33,552 - INFO - AGOUTI_SAM PROGRESS - 485860 joining pairs parsed 2018-07-28 05:28:33,553 - INFO - AGOUTI_SAM PROGRESS - 22893 contig pairs given by these joining pairs 2018-07-28 05:28:33,554 - INFO - AGOUTI_SAM PROGRESS - Succeeded 2018-07-28 05:28:33,557 - INFO - AGOUTI_DENOISE PROGRESS - [BEGIN] Denoising joining pairs Traceback (most recent call last): File "/export/bin/agouti.py", line 290, in main() File "/export/bin/agouti.py", line 287, in main args.func(args) File "/export/bin/agouti.py", line 213, in run_scaffolder args.debug) File "/export/source/git/AGOUTI/src/agouti_denoise.py", line 355, in denoise_joining_pairs dCtgPair2GenePair[vertex2Name.index(ctgA), vertex2Name.index(ctgB)] = [geneA, geneB] ValueError: 'chr_NW_006429596.1' is not in list

Arkhaan commented 5 years ago

Apparently it really comes from the name. chr_NW_006429596.1 is the original chromosome name, but agouti shreds it to chr_NW_006429596.1_1, chr_NW_006429596.1_2, etc... And hisat2 keeps truncated the name back to chr_NW_006429596.1, I don't know why yet. So the issue is not from AGOUTI, though interesting nonetheless