snayfach / MAGpurify

Improvement of metagenome-assembled genomes
GNU General Public License v3.0
47 stars 12 forks source link

Suggestions based on trial use #17

Open jdwinkler-lanzatech opened 3 years ago

jdwinkler-lanzatech commented 3 years ago

Hi,

Thanks for your work on MAGpurify, I just tried it out on some example data and it seems to work quite well. I do have a few questions if you have time to answer them:

  1. I am not sure I understand how the conspecific module will deal with "novel" sequences, e.g. contigs that are actually derived from a conspecific collection of species but are not present in the type strains available in the accompanying assemblies used to construct the Mash sketch that performs the initial tax assignment. Will these contigs be excluded as contaminants?

  2. How are calls from multiple methods (tetra vs. gc vs. phylogenetic markers vs. conspecific etc) integrated?

  3. Is it possible to modify the conspecific module to automatically decompress reference genomes if stored as individual archives (fasta1.fna.gz, etc) prior to BLAST-ing? Mash can handle compressed input fine, so would be good to save space this way especially when running MAGpurify in a docker container.

jdwinkler-lanzatech commented 3 years ago

Also, Mash screen might be a better choice for the distance calculations in the conspecific module: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1841-x but I don't think it was available during the initial development of MAGPurify based on the publication dates.