theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
37 stars 17 forks source link

[kSNP3] output SNP matrix headers #420

Closed frankambrosio3 closed 3 months ago

frankambrosio3 commented 5 months ago

:bug:

:pencil: Describe the Issue

Currently, the kSNP3 workflow appends the string "c:1" to every sample name in the horizontal header of the core and pan snp matrices. This renders the file incompatible with MicrobeTrace as well as several other downstream analysis tools.

:repeat: How to Reproduce

Running kSNP3 on Terra will result in the generation of 2 snp matrices, both of which will include the appended suffix in the horizontal header.

:fishing_pole_and_fish: Expected Behavior

kSNP3 should produce a core and pan SNP matrix with unmodified sample names in both the vertical and horizontal headers.

:floppy_disk: Version Information

All versions of kSNP3.

:information_source: Additional Information

https://dockstore.org/workflows/github.com/theiagen/public_health_bioinformatics/kSNP3_PHB:main?tab=info

sage-wright commented 5 months ago

this can be turned off by setting use_phandango_coloring to false

sam-baird commented 5 months ago

I've been seeing ":c1" appended to the sample names in the Snippy_Streamline SNP matrices as well. Looks like it is being added in task_reorder_matrix.wdl:

https://github.com/theiagen/public_health_bioinformatics/blob/880a66cb285e89e6c85893daa5a8d15c29aa3904/tasks/phylogenetic_inference/utilities/task_reorder_matrix.wdl#L47C1-L51C1

frankambrosio3 commented 4 months ago

Tested here: https://app.terra.bio/#workspaces/theiagen-validations/ambrosio_validation_sandbox/job_history/d7f749d0-6847-403f-9eea-a1232994af16

Resulting core snp matrix has the coloring tags in the header: image