Closed garimak05 closed 4 years ago
Hi Garimak05,
I've just tested with the latest iCount version without issues.
$ iCount genome --genome iCount_genomes/mus_musculus/mus_musculus.fa.gz --source ensembl mus_musculus 88
$ iCount annotation --annotation iCount_genomes/mus_musculus/mus_musculus.gtf.gz --source ensembl mus_musculus 88
$ iCount segment iCount_genomes/mus_musculus/mus_musculus.gtf.gz iCount_genomes/mus_musculus/mus_musculus_segment.gtf iCount_genomes/mus_musculus/mus_musculus.fa.gz.fai
Note that my downloaded mus_musculus gtf also have same 1nt features like ENSMUST00000081544 and that segment output (mm88seg.gtf) should be uncompressed.
I suspect that updating your bedtools version will solve the problem, I'm using bedtools v2.28.0. If not try downloading again the annotation data or pybedtools.remove_invalid()
Hope it works!
Hi Igor,
Updating bedtools resolved the issue. Thanks a lot!
Garima
I am running iCount on ensembl gtf (downloaded through iCount) and getting the following error while generating the segments:
Executing the following command: iCount segment mus_musculus.88.gtf.gz mm88seg.gtf.gz mus_musculus.88.fa.gz.fai Input parameters for function 'get_segments' in iCount.genomes.segment annotation: mus_musculus.88.gtf.gz segmentation: mm88seg.gtf.gz fai: mus_musculus.88.fa.gz.fai report_progress: False Calculating intergenic intervals... [MalformedBedLineError] Start is greater than stop File "/lustre/home/regmgkh/Software/iCount/iCount/cli.py", line 444, in main result_object = func(**args)
File "/lustre/home/regmgkh/Software/iCount/iCount/genomes/segment.py", line 728, in get_segments intergenic_pos = _complement(gtf.fn, fai, '+')
File "/lustre/home/regmgkh/Software/iCount/iCount/genomes/segment.py", line 548, in _complement for n, i in enumerate(intergenic_bed)
File "/home/regmgkh/.python3local/lib/python3.7/site-packages/pybedtools/bedtool.py", line 917, in decorated result = method(self, *args, **kwargs)
File "/home/regmgkh/.python3local/lib/python3.7/site-packages/pybedtools/bedtool.py", line 3342, in saveas out_compressed=compressed,
File "/home/regmgkh/.python3local/lib/python3.7/site-packages/pybedtools/bedtool.py", line 1412, in _collapse for i in iterable:
File "pybedtools/cbedtools.pyx", line 759, in pybedtools.cbedtools.IntervalIterator.next
File "/lustre/home/regmgkh/Software/iCount/iCount/genomes/segment.py", line 547, in
create_interval_from_list([i[0], '.', type_name, str(int(i[1]) + 1), i[2], '.', strand, '.', col8 % n])
File "pybedtools/cbedtools.pyx", line 792, in pybedtools.cbedtools.IntervalIterator.next
File "pybedtools/cbedtools.pyx", line 701, in pybedtools.cbedtools.create_interval_from_list
I have checked the gtf and I have few entries with 1 bp (example below), but none of them has start greater than the stop.
1 ensembl CDS 22533796 22533796 . - 1 gene_id "ENSMUSG00000041670"; gene_version "16"; transcript_id "ENSMUST00000081544"; transcript_version "12"; exon_number "8"; gene_name "Rims1"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; havana_gene "OTTMUSG00000046025"; havana_gene_version "2"; transcript_name "Rims1-201"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSMUSP00000080259"; protein_version "6"; tag "basic"; transcript_support_level "5";
What do you suggest I do about this issue?
Thanks for your help.