jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

biocode error #73

Closed shuhailaMSR closed 3 years ago

shuhailaMSR commented 3 years ago

I have already install the biocode using pip install biocode and the system recognized then using the script convert_augustus_to_gff3.py getting NO result and NO error. then if I try to run step by step in python3.7 there is an error said

from biocode import gff, things Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'biocode'

jorvis commented 3 years ago

Python handles modules within each installation, so if you have two different versions of python on your system you need to install a module (like biocode) in each of them. Where is your python3.7? Is it a system-wide install or a custom-compiled one?

$ which python3.7

shuhailaMSR commented 3 years ago

there is a lot of python version and if I type python it will automatically enter python2.7 which python3.7 /usr/local/bin/python3.7

jorvis commented 3 years ago

OK, so someone probably symlinked your python3 there. We need to see where it goes:

$ ls -l /usr/local/bin/python3.7

shuhailaMSR commented 3 years ago

python3 Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import argparse import re

from biocode import gff, things

this mean the biocode is ok? but why there is no change in the result?

convert_augustus_to_gff3.py -i aug_26.gff3 -o output.gff3

where the aug_26.gff3 is output from Augustus.

the output.gff3 only added header

gff-version 3

gff-version 3

This output was generated with AUGUSTUS (version 3.3.2).

jorvis commented 3 years ago

You linked 3.7 but then that output showed 3.6.9.

Do you mind attaching the file and I'll see if I can look at it today?

jorvis commented 3 years ago

@shuhailaMSR Please attach the file, compressed rather than pasting it in the box.

shuhailaMSR commented 3 years ago

hi, I change to txt file since it cannot accept gff3 format.

aug_26.txt

jorvis commented 3 years ago

Are you sure it only added the header? I ran it myself like this:

$ convert_augustus_to_gff3.py -i aug_26.txt -o aug_26.gff

And it transformed each gene to fit a GFF3-correct gene model. For example this was the input block for the 3rd prediction in your file:

# start gene g3
S010000001.1    AUGUSTUS    gene    6829    12355   0.38    -   .   ID=g3
S010000001.1    AUGUSTUS    transcript  6829    12355   0.38    -   .   ID=g3.t1;Parent=g3
S010000001.1    AUGUSTUS    stop_codon  6829    6831    .   -   0   Parent=g3.t1
S010000001.1    AUGUSTUS    intron  9619    9978    0.66    -   .   Parent=g3.t1
S010000001.1    AUGUSTUS    intron  9986    10979   0.66    -   .   Parent=g3.t1
S010000001.1    AUGUSTUS    CDS 6832    9618    0.6 -   0   ID=g3.t1.cds;Parent=g3.t1
S010000001.1    AUGUSTUS    CDS 9979    9985    0.66    -   1   ID=g3.t1.cds;Parent=g3.t1
S010000001.1    AUGUSTUS    CDS 10980   12355   0.69    -   0   ID=g3.t1.cds;Parent=g3.t1
S010000001.1    AUGUSTUS    start_codon 12353   12355   .   -   0   Parent=g3.t1

And this was transformed in the output to this:

# end gene g3
S010000001.1    AUGUSTUS        gene    6829    12355   .       -       .       ID=g3
S010000001.1    AUGUSTUS        mRNA    6829    12355   .       -       .       ID=g3.t1;Parent=g3
S010000001.1    AUGUSTUS        CDS     6832    9618    .       -       0       ID=g3.t1.cds;Parent=g3.t1
S010000001.1    AUGUSTUS        CDS     9979    9985    .       -       1       ID=g3.t1.cds;Parent=g3.t1
S010000001.1    AUGUSTUS        CDS     10980   12355   .       -       0       ID=g3.t1.cds;Parent=g3.t1
S010000001.1    AUGUSTUS        exon    6832    9618    .       -       .       ID=g3.t1.exon1;Parent=g3.t1
S010000001.1    AUGUSTUS        exon    9979    9985    .       -       .       ID=g3.t1.exon2;Parent=g3.t1
S010000001.1    AUGUSTUS        exon    10980   12355   .       -       .       ID=g3.t1.exon3;Parent=g3.t1
jorvis commented 3 years ago

Please re-open if you get something else.

shuhailaMSR commented 3 years ago

I have the same output as input using that command with added header ##gff-version 3. Do you know where I can figure out the error?

I only in stall pip install biocode.. Is there any necessary tool to add? Because there is no error

jorvis commented 3 years ago

Please upload a screenshot of you doing the following commands and their output:

$ convert_augustus_to_gff3.py -i aug_26.txt -o aug_26.gff
$ grep g3.t1 aug_26.*
shuhailaMSR commented 3 years ago
Screenshot 2021-08-18 at 12 13 29 PM

I think it work but it also pull out the protein sequence from input file. sorry my mistake.

jorvis commented 3 years ago

I'm not sure what you mean there. You think the script edited your input file?

shuhailaMSR commented 3 years ago

yes, I think the script only edit input file by changing transcript to mRNA and soon. Then I need to extract the list of the chromosomeID to pull out gff3 format without protein sequence

jorvis commented 3 years ago

It should replace transcript with mRNA, remove derived features, add exons, etc. And in both the input and output the comments from Augustus with protein sequences are retained. If you want to extract the FASTA regions after that you can use this:

https://github.com/jorvis/biocode/blob/master/gff/write_fasta_from_gff.py

shuhailaMSR commented 3 years ago

ok, noted. tq Jorvis