ialbert / bio

Making bioinformatics fun again
MIT License
60 stars 13 forks source link

bio gff failed to run #31

Closed haipinghao closed 9 months ago

haipinghao commented 11 months ago

I tried to run bio gff to convert a genbank file to gff and got the following error: $ bio gff MHVgenomes.gb --type CDS Traceback (most recent call last): File "/home/hhao/mambaforge-pypy3/bin/bio", line 8, in sys.exit(run()) File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/main.py", line 14, in run router() File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/main.py", line 130, in wrapper func(*args, *kwargs) File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/main.py", line 202, in router plac.call(func) File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/libs/placlib.py", line 440, in call cmd, result = parser.consume(arglist) File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/libs/placlib.py", line 291, in consume return cmd, self.func((args + varargs + extraopts), **kwargs) File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/gff.py", line 17, in run convert.run(start=start, end=end, type=type, match=match, id=id_, gene=gene, rename=rename, olap=olap, fasta=False, fnames=fnames) File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/convert.py", line 406, in run recs = parser.get_records(fnames) File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/parser.py", line 422, in get_records recs = flatten(reader) File "/home/hhao/mambaforge-pypy3/lib/pypy3.9/site-packages/biorun/parser.py", line 215, in flatten return functools.reduce(operator.iconcat, nested, []) TypeError

ialbert commented 11 months ago

can you tell me how you made the MHV_genomes.gb file

haipinghao commented 11 months ago

bio fetch NC_048217.1 > MHV_genomes.gb

ialbert commented 11 months ago

On my system it seems to work fine:

bio fetch NC_048217.1 > MHV_genomes.gb

bio gff MHV_genomes.gb | head -5

prints:

##gff-version 3
NC_048217.1     .       five_prime_UTR  1       210     .       +       .       ID=1;Name=five_prime_UTR-1;Parent=five_prime_UTR-1;color=#cc0e74
NC_048217.1     .       gene    1       65      .       +       .       ID=2;Name=HO264_gs01;Parent=HO264_gs01;color=#cb7a77
NC_048217.1     .       misc_RNA        1       65      .       +       .       ID=3;Name=misc_RNA-1;Parent=misc_RNA-1
NC_048217.1     .       misc_feature    66      72      .       +       .       ID=4;Name=misc_feature-1;Parent=misc_feature-1

can you check what is inside your genbank file?

haipinghao commented 11 months ago

It seems normal. $ cat MHV_genomes.gb |head LOCUS NC_048217 31335 bp ss-RNA linear VRL 16-SEP-2020 DEFINITION Murine hepatitis virus strain A59, complete genome. ACCESSION NC_048217 VERSION NC_048217.1 DBLINK BioProject: PRJNA485481 KEYWORDS RefSeq. SOURCE Murine hepatitis virus ORGANISM Murine hepatitis virus Viruses; Riboviria; Orthornavirae; Pisuviricota; Pisoniviricetes; Nidovirales; Cornidovirineae; Coronaviridae; Orthocoronavirinae;

haipinghao commented 11 months ago

I installed using pip install bio --upgrade as suggested. It seems only bio fetch works. All other command give similar errors.

ialbert commented 11 months ago

what happens if you run

bio test
haipinghao commented 11 months ago

it run 4% and then give TypeError

ialbert commented 11 months ago

there is one more flag you can try, add --verbose, it should print "genbank format"

 bio gff MHV_genomes.gb --type CDS --verbose  | head -5

should look like this:

# parser.get_streams: open: MHV_genomes.gb
# parser.parse_stream: parsing: genbank
ialbert commented 11 months ago

Let's also check your BioPython install with a simple code, does this work?

from Bio.SeqIO import parse

for rec in parse("MHV_genomes.gb", format="genbank"):
    print (rec)

I can't quite troubleshoot the error since it does not manifest itself

haipinghao commented 11 months ago

Istvan, Thanks for all your help. When ran "bio gff MHV_genomes.gb --type CDS --verbose | head -5", I got the two lines as you shown and then the error message. When run the python code for testing BioPython install, the code works in python and printed the correct information!

ialbert commented 11 months ago

I have tested five different system from Linux, MacOS Intel, MacOS M1 ... etc I am unable to see the error

the error seems to indicate that the object returned from the GenBank parser is not an iterator type,

but it is impossible to troubleshoot if I cannot reproduce the error.

Can you create a new environment and see if the error persists?

conda create -n test python
conda activate test

pip install bio --upgrade
bio fetch NC_048217.1 > MHV_genomes.gb
bio gff MHV_genomes.gb | head -5
haipinghao commented 11 months ago

That works! I guess I did not install bio into a separate environment and it got installed into base. That apparently is the problem. Is there a way to uninstall the first installation? Thank you!

ialbert commented 11 months ago

start using micromamba as the book indicates in the newest edition,

micromamba does not have base environment so that problem is solved

the existence of base environment is a mistake that conda made when it was invented

you don't have to uninstall the base, just never use the base environment in general you will run into problems

always set up separate enviroments

haipinghao commented 11 months ago

Thank you!

ialbert commented 9 months ago

Fixed by @peterjc new release 1.6.2 now live