signaturescience / metscale

MetScale: snakemake workflows to scale metagenome analyses
BSD 3-Clause "New" or "Revised" License
18 stars 9 forks source link

Feature request: Final report #11

Closed lovettse closed 5 years ago

lovettse commented 5 years ago

A final report, maybe just an html document with links to the various outputs with a brief explanation as to what they are would be very helpful.

kternus commented 5 years ago

This is an excellent feature request! 👍

I think the generation of a final report will also address Issue #9, in that we can add a single "final_report" rule that will run all of the dependent rules required to populate the final report with data. That will make the snakemake workflow execution command significantly shorter.

stephenturner commented 5 years ago

Changes needed for version 1.2

stephenturner commented 5 years ago

Candidate .gitignore for a completed workflow. This will allow you to commit the files needed but ignore most of the stuff from prokka and assembler intermediates.

*metaspades/
*megahit/
*.err
*.faa
*.ffn
*.fna
*.fsa
*.gbk
*.gff
*.sqn
*.tbl
*.gz
stephenturner commented 5 years ago

You can view the updated report by either:

  1. Going directly to https://rawcdn.githack.com/signaturescience/metagenomics/478682eab848f213129a730a4bbf5b7002eb2c0e/workflows/data/SRR606249_subset10_1_reads_finished/0-summary-report.html
  2. Download the repo zip file, extract, and navigate to the file: workflows/data/SRR606249_subset10_1_reads_finished/0-summary-report.html.

A few notes:

  1. The source document that generates the report is placed in the folder where the final output exists after running snakemake --use-singularity post_processing_move_samples_dir_workflow. In this case, the file lives at workflows/data/SRR606249_subset10_1_reads_finished/0-summary-report.Rmd
  2. The sample id is supplied as a parameter in the YAML of the RMarkdown that generates the report. Once declared, the sample ID can be referenced throughout via params$id. (More info...). E.g.:
    params:
      id: "SRR606249_subset10_1_reads"
  3. All major sections begin with a description of the procedure, pulled from the wiki. Results are then displayed under tabs (and subtab buttons) underneath major section headings. Note that images are currently hotlinked against this repo. This could be revised to work in an airgapped system by referencing the image files in the repository using a relative path. However, the structure of this relative path depends on how this is eventually rolled into a snakemake workflow.
  4. Creation of this report using a snakemake workflow should be the topic of a separate issue, #16. Another workflow, run after the post_processing_move_samples_dir_workflow workflow, could create the Rmarkdown source, replacing the parameter above with the sample ID, and then issuing the rmarkdown::render() function to create the final report.

cc @cgrahlm @kternus

stephenturner commented 5 years ago

Note the root level .gitignore and the .gitignore in the workflows/data/SRR606249_subset10_1_reads_finished directory.

This part of the root-level gitignore allows you to ignore everything in the data directory except the SRR606249_subset10_1_reads_finished where the example dataset was run and where the final report goes.

# Ignore data/ dirs
workflows/read_filtering/data
workflows/taxonomic_classification/data

# Don't blacklist workflows/data recursively
!workflows/data/
# Ignore everything under workflows/data
workflows/data/*
# Except this particular directory and everything under it
!workflows/data/SRR606249_subset10_1_reads_finished/
!workflows/data/SRR606249_subset10_1_reads_finished/*

The .gitignore in the workflows/data/SRR606249_subset10_1_reads_finished directory ensures you don't commit gigabytes of data, only committing files needed for report generation and other negligible size files.