jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis

GNU General Public License v3.0

346 stars 81 forks source link

identify genenames annotated with one database with other. #738

Closed afsanarupa closed 7 months ago

afsanarupa commented 9 months ago

Hello, I have performed co-assembly of a whole metagenome consisting 8 samples. I have gathered the orfs associated with my external database like following:

get the information of GH families annotated with CazyDb

GH <- subsetFun(met_jute, fun = '^GH', rescale_copy_number = F) However, it comes out that it also gathers information for ORFs and annotation associated with all other default databases which doesnot start with GH. Can you help me with identifying ORFs which are annotated with CazyDb as "GH", then I want to identify whether they are also annotated with Pfam or KEGG database and if so what are their identity.

Thank you very much.

fpusan commented 9 months ago

You can use the option columns to limit the columns in which subsetFun searches for patterns. You can list all the columns in the table with colnames(met_jute$orfs$table) So e.g. if the CAZY identifiers are in a column named CAZY ID you should try GH <- subsetFun(met_jute, fun = '^GH', rescale_copy_number = F, columns = 'CAZY ID') Hope this helps

afsanarupa commented 9 months ago

Hello, I am still getting KEGG and PFAM functions for the GH <- subsetFun(met_jute, fun = '^GH', rescale_copy_number = F, columns = 'CAZydb') but the subsetted SQM object still has functions from KEGG and PFAM which does not have GH at their beginning. My ultimate goal is to get the taxa associated with the subset, should I look for bins abundance or taxa at different tier?

fpusan commented 9 months ago

That should be straightforward. Assuming you used default parameters in SqueezeMeta, GH$orfs$table should already contain the KEGG, COG, PFAM annotations of the ORFs in your subset (which are associated with CAZY families starting with the substring GH)

fpusan commented 7 months ago

Closing due to lack of activity, feel free to reopen