thomasvangurp / epiGBS

Code for working with epiGBS data
MIT License
10 stars 7 forks source link

make_reference.py: min_unique_size and clustering_threshold ignored? #10

Open MWSchmid opened 7 years ago

MWSchmid commented 7 years ago

Hi Thomas

I ran make_reference.py and had a look at the consensus_cluster.fa when I noticed that it contained clusters of size 1 even though I set --min_unique_size 2. Then, while having a closer look at the script, line 496 (function cluster_consensus), I saw that the clustering threshold seems to be hard-coded as well (vsearch... -id 0.95). Both options should be passed on to vsearch at this step, or?

Best regards,

Marc

MWSchmid commented 7 years ago

btw - I think that there is no argument for cluster size filtering in vsearch --cluster_smallmem

MWSchmid commented 6 years ago

I saw that you removed "-minuniquesize 2". In the meantime I noticed that I mixed up "vsearch -derep_fulllength" (in dereplicate_reads()) and "vsearch -cluster_consensus" (in cluster_consensus()). I apologize. The "-minuniquesize" argument exists for the first but not the latter (and you did use it in the context of "vsearch -derep_fulllength").

I think that the argument made sense at that stage - or at least it would be good to have the option to test different "-minuniquesize"s (--min_unique_size forwarding in make_reference.py).