Error in generating segments

garimak05 commented 4 years ago

I am running iCount on ensembl gtf (downloaded through iCount) and getting the following error while generating the segments:

Executing the following command: iCount segment mus_musculus.88.gtf.gz mm88seg.gtf.gz mus_musculus.88.fa.gz.fai Input parameters for function 'get_segments' in iCount.genomes.segment annotation: mus_musculus.88.gtf.gz segmentation: mm88seg.gtf.gz fai: mus_musculus.88.fa.gz.fai report_progress: False Calculating intergenic intervals... [MalformedBedLineError] Start is greater than stop File "/lustre/home/regmgkh/Software/iCount/iCount/cli.py", line 444, in main result_object = func(**args)

File "/lustre/home/regmgkh/Software/iCount/iCount/genomes/segment.py", line 728, in get_segments intergenic_pos = _complement(gtf.fn, fai, '+')

File "/lustre/home/regmgkh/Software/iCount/iCount/genomes/segment.py", line 548, in _complement for n, i in enumerate(intergenic_bed)

File "/home/regmgkh/.python3local/lib/python3.7/site-packages/pybedtools/bedtool.py", line 917, in decorated result = method(self, *args, **kwargs)

File "/home/regmgkh/.python3local/lib/python3.7/site-packages/pybedtools/bedtool.py", line 3342, in saveas out_compressed=compressed,

File "/home/regmgkh/.python3local/lib/python3.7/site-packages/pybedtools/bedtool.py", line 1412, in _collapse for i in iterable:

File "pybedtools/cbedtools.pyx", line 759, in pybedtools.cbedtools.IntervalIterator.next

File "/lustre/home/regmgkh/Software/iCount/iCount/genomes/segment.py", line 547, in create_interval_from_list([i[0], '.', type_name, str(int(i[1]) + 1), i[2], '.', strand, '.', col8 % n])

File "pybedtools/cbedtools.pyx", line 792, in pybedtools.cbedtools.IntervalIterator.next

File "pybedtools/cbedtools.pyx", line 701, in pybedtools.cbedtools.create_interval_from_list

I have checked the gtf and I have few entries with 1 bp (example below), but none of them has start greater than the stop.

1 ensembl CDS 22533796 22533796 . - 1 gene_id "ENSMUSG00000041670"; gene_version "16"; transcript_id "ENSMUST00000081544"; transcript_version "12"; exon_number "8"; gene_name "Rims1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; havana_gene "OTTMUSG00000046025"; havana_gene_version "2"; transcript_name "Rims1-201"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMUSP00000080259"; protein_version "6"; tag "basic"; transcript_support_level "5";

What do you suggest I do about this issue?

Thanks for your help.

Faitero commented 4 years ago

Hi Garimak05,

I've just tested with the latest iCount version without issues.

$ iCount genome --genome iCount_genomes/mus_musculus/mus_musculus.fa.gz --source ensembl mus_musculus 88

$ iCount annotation --annotation iCount_genomes/mus_musculus/mus_musculus.gtf.gz --source ensembl mus_musculus 88

$ iCount segment iCount_genomes/mus_musculus/mus_musculus.gtf.gz iCount_genomes/mus_musculus/mus_musculus_segment.gtf iCount_genomes/mus_musculus/mus_musculus.fa.gz.fai

Note that my downloaded mus_musculus gtf also have same 1nt features like ENSMUST00000081544 and that segment output (mm88seg.gtf) should be uncompressed.

I suspect that updating your bedtools version will solve the problem, I'm using bedtools v2.28.0. If not try downloading again the annotation data or pybedtools.remove_invalid()

Hope it works!

garimak05 commented 4 years ago

Hi Igor,

Updating bedtools resolved the issue. Thanks a lot!

Garima

tomazc / iCount

Error in generating segments #200