songweizhi / BioSAK

A Swiss-Army-Knife for Bioinformaticians
GNU General Public License v3.0
11 stars 7 forks source link

Possibility to annotate DNA sequences for COGs #7

Open IvanUgrin-Genalytics opened 1 week ago

IvanUgrin-Genalytics commented 1 week ago

Hello. I am trying to analyze a DNA sequence fasta file for COGs. Is it possible to use a fasta file for that purpose? The file is aligned with gene NCBI IDs.

Example command: BioSAK COG2020 -m N -t 6 -db_dir ./COG_db_dir -i input.ffn

Example input format of fasta:

>AB679109.1 GTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGCGCGCGCAGGCGGATCAGTCAGTCTGTCTTAAAAGTTCGGGGCTTAACCCCGTGATGGGATGGAAACTGCTGATCTAGAGTATCGGAGAGGAAAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAAGAACACCAGTGGCGAAGGCGACTTTCTGGACGAAAACTGACGCTGAGGCGCGAAAGCCAGGGGAGCGAACGGGATTAGAAACCCCAGTAGTCC

songweizhi commented 3 days ago

this shoudl work

IvanUgrin-Genalytics commented 1 day ago

The problem is that these are nucleotide gene annotations from the Silva database. I suppose BioSAK expects annotated protein sequences but I have Nucleotide annotated sequences. The main question is: Is it possible to translate the Nucleotide to Protein sequences and annotate them to be used with BioSAK and do you know of a pipeline that does that? @songweizhi