lskatz / mashtree

:deciduous_tree: Create a tree using Mash distances
GNU General Public License v3.0
156 stars 24 forks source link

Unexpected mashtree parameters for bootstrap replicates #71

Open lcoombe opened 1 year ago

lcoombe commented 1 year ago

Describe the bug It looks like some mashtree parameters such as --kmerlength and --sketch-size are not propagated to mash sketch when running the bootstrap replicates.

To Reproduce Steps to reproduce the behavior: I suspected there was an issue when I got an error when running mashtree_bootstrap.pl. The error itself was my fault and resolved, but I noticed messages like this in the error log:

mashtree: mashSketch(TID1): ERROR running mash sketch -S 1453011824 -k 21 -s 10000  -o /var/tmp/MASHTREE_BOOTSTRAP.vlWjIT/3/files.fa files.fa 2>&1!

This message was unexpected because I had specified --kmerlength 22 --sketch-size 1000000. I confirmed from the log that these parameters looked fine for the initial mashtree run:

mashtree --outmatrix /var/tmp/MASHTREE_BOOTSTRAP.9F2rzJ/observeddistances.tsv.tmp --tempdir /var/tmp/MASHTREE_BOOTSTRAP.9F2rzJ/observed --numcpus 48 --genomesize 100000 --kmerlength 22 --mindepth 1 --sketch-size 1000000 <my input fasta files > /var/tmp/MASHTREE_BOOTSTRAP.9F2rzJ/observed.dnd.tmp

Expected behavior I would expect the mashtree runs with the different seeds for bootstrapping would use the same parameters

Additional context I may be misunderstanding the code or log files, but I think the issue could be due to this snippet of code: https://github.com/lskatz/mashtree/blob/master/bin/mashtree_bootstrap.pl#L160-L172 The parameter $mashtreeOptions doesn't appear to be used?

Thank you very much for making this great software!

Rowena-h commented 5 months ago

I experienced this too, but as described in Issue #63 this behaviour is fixed by putting --kmerlength and other parameters relevant to the sketch after -- double dashes. Something like:

mashtree_bootstrap.pl --reps 1000 --numcpus 48 input_files/* -- --min-depth 0 --kmerlength 5  > out.tre

It then ran exactly as expected for me!