Closed elcega closed 3 years ago
Hello, yes, this is possible!
First, you'll want to annotate your entire metatranscriptome against both the RefSeq and the Subsystems databases.
Next, the approach I've used is to run the DIAMOND_specific_organism_retriever.py tool (https://github.com/transcript/samsa2/blob/master/python_scripts/DIAMOND_specific_organism_retriever.py) on the RefSeq results, giving the name of the specific organism with the -SO
flag. This should output a results file that contains only the hits to that organism.
Third, you would run db_results_swapper.py (https://github.com/transcript/samsa2/blob/master/python_scripts/db_results_swapper.py) on these files, providing the filtered-to-your-organism RefSeq file as the input with -I
, and the full list of Subsystems annotations with the -A
flag as the annotation file.
This db_results_swapper.py script should make a dictionary of all the hits in your organism-specific RefSeq results, and then print out each matching Subsystems annotation that matches that same original read in the outfile. You can then run the Subsystems analysis counter on this output to see the breakdown of different functions.
Best, Sam
Is there a script so obtain the SEED subsystems results from a specific organism instead of from all the existing organisms in the samples?