Closed shuhailaMSR closed 3 years ago
Python handles modules within each installation, so if you have two different versions of python on your system you need to install a module (like biocode) in each of them. Where is your python3.7? Is it a system-wide install or a custom-compiled one?
$ which python3.7
there is a lot of python version and if I type python it will automatically enter python2.7 which python3.7 /usr/local/bin/python3.7
OK, so someone probably symlinked your python3 there. We need to see where it goes:
$ ls -l /usr/local/bin/python3.7
python3 Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.
import argparse import re
from biocode import gff, things
this mean the biocode is ok? but why there is no change in the result?
convert_augustus_to_gff3.py -i aug_26.gff3 -o output.gff3
where the aug_26.gff3 is output from Augustus.
the output.gff3 only added header
You linked 3.7 but then that output showed 3.6.9.
Do you mind attaching the file and I'll see if I can look at it today?
@shuhailaMSR Please attach the file, compressed rather than pasting it in the box.
hi, I change to txt file since it cannot accept gff3 format.
Are you sure it only added the header? I ran it myself like this:
$ convert_augustus_to_gff3.py -i aug_26.txt -o aug_26.gff
And it transformed each gene to fit a GFF3-correct gene model. For example this was the input block for the 3rd prediction in your file:
# start gene g3
S010000001.1 AUGUSTUS gene 6829 12355 0.38 - . ID=g3
S010000001.1 AUGUSTUS transcript 6829 12355 0.38 - . ID=g3.t1;Parent=g3
S010000001.1 AUGUSTUS stop_codon 6829 6831 . - 0 Parent=g3.t1
S010000001.1 AUGUSTUS intron 9619 9978 0.66 - . Parent=g3.t1
S010000001.1 AUGUSTUS intron 9986 10979 0.66 - . Parent=g3.t1
S010000001.1 AUGUSTUS CDS 6832 9618 0.6 - 0 ID=g3.t1.cds;Parent=g3.t1
S010000001.1 AUGUSTUS CDS 9979 9985 0.66 - 1 ID=g3.t1.cds;Parent=g3.t1
S010000001.1 AUGUSTUS CDS 10980 12355 0.69 - 0 ID=g3.t1.cds;Parent=g3.t1
S010000001.1 AUGUSTUS start_codon 12353 12355 . - 0 Parent=g3.t1
And this was transformed in the output to this:
# end gene g3
S010000001.1 AUGUSTUS gene 6829 12355 . - . ID=g3
S010000001.1 AUGUSTUS mRNA 6829 12355 . - . ID=g3.t1;Parent=g3
S010000001.1 AUGUSTUS CDS 6832 9618 . - 0 ID=g3.t1.cds;Parent=g3.t1
S010000001.1 AUGUSTUS CDS 9979 9985 . - 1 ID=g3.t1.cds;Parent=g3.t1
S010000001.1 AUGUSTUS CDS 10980 12355 . - 0 ID=g3.t1.cds;Parent=g3.t1
S010000001.1 AUGUSTUS exon 6832 9618 . - . ID=g3.t1.exon1;Parent=g3.t1
S010000001.1 AUGUSTUS exon 9979 9985 . - . ID=g3.t1.exon2;Parent=g3.t1
S010000001.1 AUGUSTUS exon 10980 12355 . - . ID=g3.t1.exon3;Parent=g3.t1
Please re-open if you get something else.
I have the same output as input using that command with added header ##gff-version 3. Do you know where I can figure out the error?
I only in stall pip install biocode.. Is there any necessary tool to add? Because there is no error
Please upload a screenshot of you doing the following commands and their output:
$ convert_augustus_to_gff3.py -i aug_26.txt -o aug_26.gff
$ grep g3.t1 aug_26.*
I think it work but it also pull out the protein sequence from input file. sorry my mistake.
I'm not sure what you mean there. You think the script edited your input file?
yes, I think the script only edit input file by changing transcript to mRNA and soon. Then I need to extract the list of the chromosomeID to pull out gff3 format without protein sequence
It should replace transcript with mRNA, remove derived features, add exons, etc. And in both the input and output the comments from Augustus with protein sequences are retained. If you want to extract the FASTA regions after that you can use this:
https://github.com/jorvis/biocode/blob/master/gff/write_fasta_from_gff.py
ok, noted. tq Jorvis
I have already install the biocode using pip install biocode and the system recognized then using the script convert_augustus_to_gff3.py getting NO result and NO error. then if I try to run step by step in python3.7 there is an error said