weberlab-hhu / Helixer

Using Deep Learning to predict gene annotations
GNU General Public License v3.0
171 stars 29 forks source link

add_ngs_coverage.py error when there are chromosome/ctg name like ptg000636l:1-52000 ptg000636l:52001-83281 #151

Closed xiekunwhy closed 1 day ago

xiekunwhy commented 2 weeks ago

Describe the bug add_ngs_coverage.py may error stop if there are chromosome/ctg name like ptg000636l:1-52000 ptg000636l:52001-83281.

To Reproduce Commands ran, especially any commands that threw an error

source /public2/home/sl_qybio/sl_qybio/miniforge3/bin/activate helixer
/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/bin/python /public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/site-packages/helixer/evaluation/add_ngs_coverage.py -s train --unstranded --bam /public2/home/sl_qybio/sl_qybio/project/2024/QY240516GA057_LanHua_genome/QY240516GA057_std_3_annot/pipe/02.mrna/03.mapping/LHleaf.sort.bam /public2/home/sl_qybio/sl_qybio/project/2024/QY240516GA057_LanHua_genome/QY240516GA057_std_3_annot/pipe/02.mrna/03.mapping/LHroot.sort.bam /public2/home/sl_qybio/sl_qybio/project/2024/QY240516GA057_LanHua_genome/QY240516GA057_std_3_annot/pipe/02.mrna/03.mapping/LHseed.sort.bam --h5-data /public2/home/sl_qybio/sl_qybio/project/2024/QY240516GA057_LanHua_genome/QY240516GA057_std_3_annot/helixerf/gra.gffread.ss.c.h5 --dataset-prefix rnaseq --threads 10

Error

start, end 0 255842
(b'Chr01', 0, 36794)
Chr01: chunks from 0-36794
(b'Chr02', 36794, 57360)
Chr02: chunks from 36794-57360
(b'Chr03', 57360, 77036)
Chr03: chunks from 57360-77036
(b'Chr04', 77036, 96404)
Chr04: chunks from 77036-96404
(b'Chr05', 96404, 113126)
Chr05: chunks from 96404-113126
(b'Chr06', 113126, 127798)
Chr06: chunks from 113126-127798
(b'Chr07', 127798, 141894)
Chr07: chunks from 127798-141894
(b'Chr08', 141894, 154344)
Chr08: chunks from 141894-154344
(b'Chr09', 154344, 166614)
Chr09: chunks from 154344-166614
(b'Chr10', 166614, 178786)
Chr10: chunks from 166614-178786
(b'Chr11', 178786, 190554)
Chr11: chunks from 178786-190554
(b'Chr12', 190554, 201548)
Chr12: chunks from 190554-201548
(b'Chr13', 201548, 211912)
Chr13: chunks from 201548-211912
(b'Chr14', 211912, 220412)
Chr14: chunks from 211912-220412
(b'Chr15', 220412, 228472)
Chr15: chunks from 220412-228472
(b'Chr16', 228472, 236136)
Chr16: chunks from 228472-236136
(b'Chr17', 236136, 242096)
Chr17: chunks from 236136-242096
(b'Chr18', 242096, 246796)
Chr18: chunks from 242096-246796
(b'Chr19', 246796, 251472)
Chr19: chunks from 246796-251472
(b'Chr20', 251472, 254274)
Chr20: chunks from 251472-254274
(b'ptg000177l', 254274, 254354)
ptg000177l: chunks from 254274-254354
(b'ptg001159l', 254354, 254392)
ptg001159l: chunks from 254354-254392
(b'ptg000332l', 254392, 254420)
ptg000332l: chunks from 254392-254420
(b'ptg000343l', 254420, 254444)
ptg000343l: chunks from 254420-254444
(b'ptg001144l', 254444, 254464)
ptg001144l: chunks from 254444-254464
(b'ptg000838l', 254464, 254484)
ptg000838l: chunks from 254464-254484
(b'ptg000678l', 254484, 254502)
ptg000678l: chunks from 254484-254502
(b'ptg001270l', 254502, 254520)
ptg001270l: chunks from 254502-254520
(b'ptg000890l', 254520, 254538)
ptg000890l: chunks from 254520-254538
(b'ptg001203l', 254538, 254552)
ptg001203l: chunks from 254538-254552
(b'ptg001294l', 254552, 254566)
ptg001294l: chunks from 254552-254566
(b'ptg001152l', 254566, 254580)
ptg001152l: chunks from 254566-254580
(b'ptg001155l', 254580, 254594)
ptg001155l: chunks from 254580-254594
(b'ptg001104l', 254594, 254608)
ptg001104l: chunks from 254594-254608
(b'ptg001202l', 254608, 254622)
ptg001202l: chunks from 254608-254622
(b'ptg000297l', 254622, 254636)
ptg000297l: chunks from 254622-254636
(b'ptg000095l', 254636, 254648)
ptg000095l: chunks from 254636-254648
(b'ptg001230l', 254648, 254660)
ptg001230l: chunks from 254648-254660
(b'ptg001158l', 254660, 254672)
ptg001158l: chunks from 254660-254672
(b'ptg001271l', 254672, 254682)
ptg001271l: chunks from 254672-254682
(b'ptg001316l', 254682, 254692)
ptg001316l: chunks from 254682-254692
(b'ptg000615l', 254692, 254702)
ptg000615l: chunks from 254692-254702
(b'ptg001288l', 254702, 254712)
ptg001288l: chunks from 254702-254712
(b'ptg001222l', 254712, 254722)
ptg001222l: chunks from 254712-254722
(b'ptg001323l', 254722, 254732)
ptg001323l: chunks from 254722-254732
(b'ptg001275l', 254732, 254742)
ptg001275l: chunks from 254732-254742
(b'ptg000335l', 254742, 254752)
ptg000335l: chunks from 254742-254752
(b'ptg000612l', 254752, 254762)
ptg000612l: chunks from 254752-254762
(b'ptg000614l', 254762, 254772)
ptg000614l: chunks from 254762-254772
(b'ptg000653l', 254772, 254782)
ptg000653l: chunks from 254772-254782
(b'ptg000344l', 254782, 254792)
ptg000344l: chunks from 254782-254792
(b'ptg000768l', 254792, 254802)
ptg000768l: chunks from 254792-254802
(b'ptg000199l', 254802, 254812)
ptg000199l: chunks from 254802-254812
(b'ptg000226l', 254812, 254820)
ptg000226l: chunks from 254812-254820
(b'ptg000492l', 254820, 254828)
ptg000492l: chunks from 254820-254828
(b'ptg000637l', 254828, 254836)
ptg000637l: chunks from 254828-254836
(b'ptg001156l', 254836, 254844)
ptg001156l: chunks from 254836-254844
(b'ptg000094l', 254844, 254852)
ptg000094l: chunks from 254844-254852
(b'ptg001205l', 254852, 254860)
ptg001205l: chunks from 254852-254860
(b'ptg001301l', 254860, 254868)
ptg001301l: chunks from 254860-254868
(b'ptg000490l', 254868, 254876)
ptg000490l: chunks from 254868-254876
(b'ptg000617l', 254876, 254884)
ptg000617l: chunks from 254876-254884
(b'ptg000609l', 254884, 254892)
ptg000609l: chunks from 254884-254892
(b'ptg001220l', 254892, 254900)
ptg001220l: chunks from 254892-254900
(b'ptg001236l', 254900, 254908)
ptg001236l: chunks from 254900-254908
(b'ptg001280l', 254908, 254916)
ptg001280l: chunks from 254908-254916
(b'ptg001031l', 254916, 254924)
ptg001031l: chunks from 254916-254924
(b'ptg001027l', 254924, 254932)
ptg001027l: chunks from 254924-254932
(b'ptg000937l', 254932, 254940)
ptg000937l: chunks from 254932-254940
(b'ptg001332l', 254940, 254948)
ptg001332l: chunks from 254940-254948
(b'ptg001272l', 254948, 254956)
ptg001272l: chunks from 254948-254956
(b'ptg000228l', 254956, 254964)
ptg000228l: chunks from 254956-254964
(b'ptg001328l', 254964, 254972)
ptg001328l: chunks from 254964-254972
(b'ptg001234l', 254972, 254980)
ptg001234l: chunks from 254972-254980
(b'ptg000608l', 254980, 254988)
ptg000608l: chunks from 254980-254988
(b'ptg001341l', 254988, 254996)
ptg001341l: chunks from 254988-254996
(b'ptg000916l', 254996, 255004)
ptg000916l: chunks from 254996-255004
(b'ptg001351l', 255004, 255012)
ptg001351l: chunks from 255004-255012
(b'ptg001232l', 255012, 255020)
ptg001232l: chunks from 255012-255020
(b'ptg001102l', 255020, 255028)
ptg001102l: chunks from 255020-255028
(b'ptg001130l', 255028, 255036)
ptg001130l: chunks from 255028-255036
(b'ptg000657l', 255036, 255044)
ptg000657l: chunks from 255036-255044
(b'ptg000616l', 255044, 255052)
ptg000616l: chunks from 255044-255052
(b'ptg001295l', 255052, 255060)
ptg001295l: chunks from 255052-255060
(b'ptg001344l', 255060, 255066)
ptg001344l: chunks from 255060-255066
(b'ptg001241l', 255066, 255072)
ptg001241l: chunks from 255066-255072
(b'ptg001023l', 255072, 255078)
ptg001023l: chunks from 255072-255078
(b'ptg000721l', 255078, 255084)
ptg000721l: chunks from 255078-255084
(b'ptg001193l', 255084, 255090)
ptg001193l: chunks from 255084-255090
(b'ptg001218l', 255090, 255096)
ptg001218l: chunks from 255090-255096
(b'ptg001181l', 255096, 255102)
ptg001181l: chunks from 255096-255102
(b'ptg001201l', 255102, 255108)
ptg001201l: chunks from 255102-255108
(b'ptg001157l', 255108, 255114)
ptg001157l: chunks from 255108-255114
(b'ptg001103l', 255114, 255120)
ptg001103l: chunks from 255114-255120
(b'ptg001154l', 255120, 255126)
ptg001154l: chunks from 255120-255126
(b'ptg001353l', 255126, 255132)
ptg001353l: chunks from 255126-255132
(b'ptg001233l', 255132, 255138)
ptg001233l: chunks from 255132-255138
(b'ptg000491l', 255138, 255144)
ptg000491l: chunks from 255138-255144
(b'ptg001215l', 255144, 255150)
ptg001215l: chunks from 255144-255150
(b'ptg001293l', 255150, 255156)
ptg001293l: chunks from 255150-255156
(b'ptg001253l', 255156, 255162)
ptg001253l: chunks from 255156-255162
(b'ptg000695l', 255162, 255168)
ptg000695l: chunks from 255162-255168
(b'ptg001188l', 255168, 255174)
ptg001188l: chunks from 255168-255174
(b'ptg000690l', 255174, 255180)
ptg000690l: chunks from 255174-255180
(b'ptg001237l', 255180, 255186)
ptg001237l: chunks from 255180-255186
(b'ptg001278l', 255186, 255192)
ptg001278l: chunks from 255186-255192
(b'ptg001240l', 255192, 255198)
ptg001240l: chunks from 255192-255198
(b'ptg001228l', 255198, 255204)
ptg001228l: chunks from 255198-255204
(b'ptg001333l', 255204, 255210)
ptg001333l: chunks from 255204-255210
(b'ptg001255l', 255210, 255216)
ptg001255l: chunks from 255210-255216
(b'ptg001297l', 255216, 255222)
ptg001297l: chunks from 255216-255222
(b'ptg001314l', 255222, 255228)
ptg001314l: chunks from 255222-255228
(b'ptg001025l', 255228, 255234)
ptg001025l: chunks from 255228-255234
(b'ptg000366l', 255234, 255240)
ptg000366l: chunks from 255234-255240
(b'ptg001141l', 255240, 255246)
ptg001141l: chunks from 255240-255246
(b'ptg001309l', 255246, 255252)
ptg001309l: chunks from 255246-255252
(b'ptg000636l:1-52000', 255252, 255258)
ptg000636l:1-52000: chunks from 255252-255258
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/site-packages/HTSeq/__init__.py", line 920, in fetch
    for pa in self.sf.fetch(reference, start, end, region):
  File "pysam/libcalignmentfile.pyx", line 1089, in pysam.libcalignmentfile.AlignmentFile.fetch
  File "pysam/libchtslib.pyx", line 663, in pysam.libchtslib.HTSFile.parse_region
ValueError: too many values to unpack (expected 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/site-packages/helixer/evaluation/add_ngs_coverage.py", line 293, in cov_by_chrom
    for read in htseqbam.fetch(region="{}:1-{}".format(chromosome, length)):
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/site-packages/HTSeq/__init__.py", line 924, in fetch
    if e.message == "fetch called on bamfile without index":
AttributeError: 'ValueError' object has no attribute 'message'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/site-packages/helixer/evaluation/add_ngs_coverage.py", line 530, in <module>
    main(args.species,
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/site-packages/helixer/evaluation/add_ngs_coverage.py", line 466, in main
    cage_coverage_from_coord_to_h5(
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/site-packages/helixer/evaluation/add_ngs_coverage.py", line 398, in cage_coverage_from_coord_to_h5
    coverage_out = p.map(cov_by_chrom, mapargs)
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/public2/home/sl_qybio/sl_qybio/miniforge3/envs/helixer/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
AttributeError: 'ValueError' object has no attribute 'message'

Environment (please complete the following information):

felicitas215 commented 2 weeks ago

Hi, thank you for reporting this issue. This looks like you tried to use add_ngs_coverage.py on a bam file without an index file (.bam.bai). As it's frequently the case, the bam file needs an index file in the same directory.

felicitas215 commented 1 day ago

I'm closing this issue, because there was no activity for 2 weeks. Feel free to reopen the issue if necessary.