theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

[Documentation] Major update and polishing on Augur documentation in preparation for next release #509

Open cimendes opened 1 week ago

jrotieno commented 1 week ago

Some notes here:

It appears that for a non-TheiaCoV pathogens, or rather for pathogens who parameters are set in the organism_parameters workflow, the tentative empty or minimal files do not work. I tested this iteratively as follows:

  1. Just providing pathogen name Fails due to missing genome length input at the organism_parameters stage https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/7e4dd3eb-1d23-4a64-96fd-6d4885522ffb

  2. Pathogen name, and genome length Fails at augur align due to a requirement of reference fasta file https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/9a09021e-04a8-4d1a-a3d5-33bff0a94ce3

  3. Pathogen name, genome length, and reference fasta Fails at the augur translate step requiring a reference genbank file https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/847daa45-584a-4e0c-86cc-4269290b9deb

  4. Pathogen name, genome length, reference fasta, and reference genbank Fails at the augur clades step, as the minimal clades file does not work https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/e86e8d4e-782f-4b77-a4dc-daab8933167c

  5. Pathogen name, genome length, reference fasta, reference genbank, and clades file Ran successfully to completion https://app.terra.bio/#workspaces/theiagen-training-workspaces/Theiagen_Otieno_Sandbox/job_history/dc5af018-b457-4ef8-bc35-fcbe049d697d

Therefore, we may have to remove/adjust the empty/minimal files, and mark the inputs organism, genome_length_input, reference_fasta, and reference_genbank, and clades_tsv as optional required?