Closed alegarritano closed 6 months ago
Thank you for using MetaCerberus. GAGE and Pathview R require KEGG KOs and currently don't function with only COGs. We are working on integrating our other tool SBGNview R into MetaCerberus. Give KEGG/FOAM KO databases a try. Then we can see if the class file isn't loading.
Hi,
I just ran it with the following parameters: metacerberus.py --protein FAA_files --hmm KOFam_all --dir_out ./KOFam
And these are the files that were generated:
FOAM_Loading_Matrix.tsv FOAM_Loadings.tsv FOAM_PCA.html KEGG_Loading_Matrix.tsv KEGG_Loadings.tsv KEGG_PCA.html counts_FOAM.tsv counts_KEGG.tsv img list.txt stats.html stats.tsv
No pathview folder was generated in the combined folder, neither I could find the class file anywhere. Am I missing something?
Thank you again for using MetaCerberus. You raise a good point here. As we need to include a better tutorial for DESeq2/EdgeR, GAGE, and pathview.
So, we are unable to automate comparisons as we are unsure the comparisons a research will want to make. The class file lists sample names and class (or grouping) or comparisions the researcher wants to make.
In a separate post, I will add examples within our results folder of a class and script for running the R related code. We are in process of converting the R code into python and removing the access to the internet requirements for the KEGG pathways.
github.com/raw-lab/metacerberus/results/rhizobium/23-06-01_rhizobium/step_10-visualizeData/combined/pathview/KEGG_class.tsv
github.com/raw-lab/metacerberus/results/rhizobium/23-06-01_rhizobium/step_10-visualizeData/combined/pathview/run_pathview.sh
bin/pathview-metacerberus.R
Let us know if this works for you? Also, if you have thoughts for making it more user friendly. I will make this into a tutorial. I think this will help.
Got it. I initially thought that the class.txt file was something else that was going to be generated by the pipeline, as I couldn't find its structure. All sorted, it's working like a charm. Thanks!
As a suggestion, I think it would probably make the heatmaps easier to interpret if instead of KO numbers, we get the name of the enzyme (e.g instead of K00027, we get "malate dehydrogenase").
Thats a fair and good point. We will take a look. Thank you again for using MetaCerberus. Also, if you want us to include your custom HMMs we can include them as separate database. And, then add them to the new FunGene in the future. Just send us an email.
Hi Richard,
Once again, thanks for developing the pipeline! Great, great work.
I have been trying to run it in a set of 5 genomes to check for possible pathway enrichments, but the pipeline finishes without generating these results.
This is the command that I am using: metacerberus.py --protein FAA_files --hmm COG --dir_out ./COG
And this is the stderr:
Starting MetaCerberus Pipeline
Starting MetaCerberus Pipeline
Checking for external dependencies: fastqc /miniconda3/envs/metacerberus/bin/fastqc flash2 /miniconda3/envs/metacerberus/bin/flash2 fastp /miniconda3/envs/metacerberus/bin/fastp porechop /miniconda3/envs/metacerberus/bin/porechop bbduk.sh /miniconda3/envs/metacerberus/bin/bbduk.sh FragGeneScanRs /miniconda3/envs/metacerberus/lib/python3.10/site-packages/meta_cerberus/FGS/FragGeneScanRs prodigal /miniconda3/envs/metacerberus/bin/prodigal prodigal-gv NOT FOUND, must be defined in config file as EXE_PRODIGAL-GV:
phanotate.py NOT FOUND, must be defined in config file as EXE_PHANOTATE:
hmmsearch /miniconda3/envs/metacerberus/bin/hmmsearch
countAssembly.py /miniconda3/envs/metacerberus/bin/countAssembly.py
Initializing RAY
2024-03-06 07:30:44,777 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
Started RAY single node
Running RAY on 1 node(s)
Using 14 CPUs per node
STEP 1: Loading sequence files: Processing 0 fastq sequences Processing 0 fasta sequences Processing 6 protein sequences Processing 0 rollup files
STEP 8: HMMER Search
STEP 8: Filtering HMMER results
STEP 9: Parse HMMER results
STEP 10: Creating Reports Saving Statistics Creating Rollup Tables Creating Count Tables PCA Analysis Creating combined sunburst and bargraphs
Finished Pipeline
Finally, these are the files that the pipeline generates:
COG_Loading_Matrix.tsv COG_Loadings.tsv COG_PCA.html counts_COG.tsv img list.txt stats.html stats.tsv
If I understood the README correctly, I would need to provide a CLASS file in order to get the GAGE/pathview results, but what would be the structure of that file?
Thanks,