Open jdwinkler-lanzatech opened 3 years ago
Also, Mash screen might be a better choice for the distance calculations in the conspecific module: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1841-x but I don't think it was available during the initial development of MAGPurify based on the publication dates.
Hi,
Thanks for your work on MAGpurify, I just tried it out on some example data and it seems to work quite well. I do have a few questions if you have time to answer them:
I am not sure I understand how the conspecific module will deal with "novel" sequences, e.g. contigs that are actually derived from a conspecific collection of species but are not present in the type strains available in the accompanying assemblies used to construct the Mash sketch that performs the initial tax assignment. Will these contigs be excluded as contaminants?
How are calls from multiple methods (tetra vs. gc vs. phylogenetic markers vs. conspecific etc) integrated?
Is it possible to modify the conspecific module to automatically decompress reference genomes if stored as individual archives (fasta1.fna.gz, etc) prior to BLAST-ing? Mash can handle compressed input fine, so would be good to save space this way especially when running MAGpurify in a docker container.