Perform iterative hakmer-ng runs with different parameter settings and choose one with large amount of reused sequence data

lutteropp / hakmer-ng-redesign

0 stars 0 forks source link

Perform iterative hakmer-ng runs with different parameter settings and choose one with large amount of reused sequence data #13

Closed lutteropp closed 5 years ago

lutteropp commented 5 years ago

Found a reasonable criterion for dynamic parameter choice: The total amount of extracted sequence data shouldn't be 0% --> do some kind of iterative hakmer-ng runs, choosing the parameter setting that maximizes sequence data usage while also having kinda large average number of taxa per block/ the least amount of missing data. This should actually be the only stuff we care about when trying to find a good tree in the end.

lutteropp commented 5 years ago

Nah, turns out we had several instances where using less data and less taxa per block actually improved our results (see Roseobacter dataset)