Closed ebolyen closed 4 years ago
With the switch to using striped unifrac I don't think this is correct either:
--p-n-jobs INTEGER [beta/beta-phylogenetic methods only, excluding weighted_unifrac - The number of jobs...
@ebolyen I'm not sure if this is still quite the same as specified in the issue. core-metrics
still specifies 0-inf, but beta-phylogenetic
doesn't limit the int value at all. It does default to 1 though. Should these be standardized (say both 0-inf or both 1-inf)? And are we still mostly just looking to update the help text to reflect the implementation?
This issue has gotten slightly more complex with the addition of unifrac.faith_pd
. I will tackle the core-metrics(-phylo)
api changes in https://github.com/qiime2/q2-diversity/pull/273. Changes to alpha
and beta
will be made in a separate PR.
@thermokarst, and core-metrics
core-metrics-phylo
both use "n-jobs" for a parameter name. We could change that to something like "cpus" or "cpus-requested" to call out that those pipelines don't exclusively use jobs or threads. The same issue will affect and diversity.alpha
diversity.beta
diversity.beta_phylogenetic
.
This would increase the severity of the breaking change, impacting all users with scripted workflows, instead of only those who are currently passing negative integers. If it's ever going to change, though, now's probably the time.
What do you think?
Edit: On second glance, stacked faith
doesn't take a parallelization parameter, so this affects only beta
and core-metrics-phylo
. Same question still applies.
What do you think?
I'm all for making that breaking change now.
Two options come to mind:
1) Make two independent parameters: n_jobs
& threads
. The help text would need to provide details as to how the two differ.
2) Mash into one parameter, like: n_jobs_or_threads
. Same deal, help text would need to explain a little bit of background.
This would increase the severity of the breaking change
This is what I was talking about on our call today, sorry if that wasn't clear: this will just have to be documented in the changelog. Its a bummer, but we don't have a good way to deprecate params right now.
Forgot to mention:
We could change that to something like "cpus" or "cpus-requested"
My concern regarding these options, is that they bring us back to the original problem that came up in the beta-div methods of q2-div-lib: "cpus" aren't necessarily what a user would be requesting or allocating here: threads or n_jobs are the things being requested. n_jobs are either threads or processes, behind the scenes (according to sklearn's joblib docs).
forum x-ref
I suspect something may have changed when we converted these to pipelines. From the type it looks like the range is 0-inf whereas the range for beta-phylogenetic is 1-inf, so we need to check the integration here as there's probably issues.