Closed afsanarupa closed 7 months ago
You can use the option columns
to limit the columns in which subsetFun
searches for patterns.
You can list all the columns in the table with colnames(met_jute$orfs$table)
So e.g. if the CAZY identifiers are in a column named CAZY ID
you should try
GH <- subsetFun(met_jute, fun = '^GH', rescale_copy_number = F, columns = 'CAZY ID')
Hope this helps
Hello, I am still getting KEGG and PFAM functions for the GH <- subsetFun(met_jute, fun = '^GH', rescale_copy_number = F, columns = 'CAZydb') but the subsetted SQM object still has functions from KEGG and PFAM which does not have GH at their beginning. My ultimate goal is to get the taxa associated with the subset, should I look for bins abundance or taxa at different tier?
That should be straightforward.
Assuming you used default parameters in SqueezeMeta, GH$orfs$table
should already contain the KEGG, COG, PFAM annotations of the ORFs in your subset (which are associated with CAZY families starting with the substring GH
)
Closing due to lack of activity, feel free to reopen
Hello, I have performed co-assembly of a whole metagenome consisting 8 samples. I have gathered the orfs associated with my external database like following:
get the information of GH families annotated with CazyDb
GH <- subsetFun(met_jute, fun = '^GH', rescale_copy_number = F) However, it comes out that it also gathers information for ORFs and annotation associated with all other default databases which doesnot start with GH. Can you help me with identifying ORFs which are annotated with CazyDb as "GH", then I want to identify whether they are also annotated with Pfam or KEGG database and if so what are their identity.
Thank you very much.