tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
822 stars 224 forks source link

prokka output files for gene quantification #576

Closed khemlalnirmalkar closed 3 years ago

khemlalnirmalkar commented 3 years ago

Hi, I just finished annotation with prokka for samples. But I am stuck, how can I calculate the genes/pathways abundance? what should be my next step? Its my first time, can you suggest or a tutorial something?

Thanks

andersgs commented 3 years ago

Hi @khemlalnirmalkar.

I am unclear what you wish to achieve. Is this an RNAseq experiment, or a comparative genomics question? Or something entirely different.

Prokka will annotate the genome and provide various output files, including one with the sequence of each annotated gene. Please see the README for a detailed account of all the output files.

What you do downstream from here, depends on your questions and experiments. If you give us a bit more details, maybe we can point you in an appropriate direction.

khemlalnirmalkar commented 3 years ago

Hi @andersgs, Sorry, my question was not clear, I have annotated bacterial genes (gut microbiome) and now I need to perform functional analyses. I am looking to compare different metabolic pathways in the subject groups of my study. But now I am not clear, how can I find the abundance of those pathways/or annotated genes? or quantify them. What would be the best downstream analyses? gff or other files are not the abundances of genes or pathways, Thanks

andersgs commented 3 years ago

I would look at Qiime2: https://currentprotocols.onlinelibrary.wiley.com/doi/full/10.1002/cpbi.100

Qiime2 seems to be the golden standard for all this metagenomics.

I also found this: https://www.nature.com/articles/srep40371

And this: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04106-7

We don't have too much experience with any of these tools. I can't guarantee that any of them will have the answers you are looking for. We work primarily with single colony sequencing. So, if you need further assistance, please contact the authors of those tools directly.

Best of luck.

khemlalnirmalkar commented 3 years ago

Thanks for the links, QIIME2 is basically for 16S. I have shotgun samples, and annotated with prokka, I will keep looking, Thanks again,

andersgs commented 3 years ago

You are welcome. Best of luck.

jorondo1 commented 1 year ago

@khemlalnirmalkar I was wondering if you could share what you ended up doing? I have MAGs we annotated and I am looking for a good way to quantify genes and metabolic pathways abundance of my sample given these specific proteins. I was thinking of using Prokka to obtain protein CDS for each sample, then maybe map sample reads to these sequences using DIAMOND or something like that.

Alternantively, I thought the protein sequences could be used to build a custom database for HUMAnN3 and do a protein search, to take advantage of their downstream scripts for metabolic pathways reconstruction and abundance estimation.

Let me know if you thought of anything else!