qiime2 / q2-diversity

BSD 3-Clause "New" or "Revised" License
4 stars 45 forks source link

core-metrics-[phylogenetic] n-jobs is inconsistent with help text #214

Closed ebolyen closed 4 years ago

ebolyen commented 6 years ago

forum x-ref

I suspect something may have changed when we converted these to pipelines. From the type it looks like the range is 0-inf whereas the range for beta-phylogenetic is 1-inf, so we need to check the integration here as there's probably issues.

akiledal commented 4 years ago

With the switch to using striped unifrac I don't think this is correct either:

--p-n-jobs INTEGER [beta/beta-phylogenetic methods only, excluding weighted_unifrac - The number of jobs...

Oddant1 commented 4 years ago

@ebolyen I'm not sure if this is still quite the same as specified in the issue. core-metrics still specifies 0-inf, but beta-phylogenetic doesn't limit the int value at all. It does default to 1 though. Should these be standardized (say both 0-inf or both 1-inf)? And are we still mostly just looking to update the help text to reflect the implementation?

ChrisKeefe commented 4 years ago

This issue has gotten slightly more complex with the addition of unifrac.faith_pd. I will tackle the core-metrics(-phylo) api changes in https://github.com/qiime2/q2-diversity/pull/273. Changes to alpha and beta will be made in a separate PR.

ChrisKeefe commented 4 years ago

@thermokarst, core-metrics and core-metrics-phylo both use "n-jobs" for a parameter name. We could change that to something like "cpus" or "cpus-requested" to call out that those pipelines don't exclusively use jobs or threads. The same issue will affect diversity.alpha and diversity.beta diversity.beta_phylogenetic.

This would increase the severity of the breaking change, impacting all users with scripted workflows, instead of only those who are currently passing negative integers. If it's ever going to change, though, now's probably the time.

What do you think?

Edit: On second glance, stacked faith doesn't take a parallelization parameter, so this affects only beta and core-metrics-phylo. Same question still applies.

thermokarst commented 4 years ago

What do you think?

I'm all for making that breaking change now.

Two options come to mind:

1) Make two independent parameters: n_jobs & threads. The help text would need to provide details as to how the two differ. 2) Mash into one parameter, like: n_jobs_or_threads. Same deal, help text would need to explain a little bit of background.

This would increase the severity of the breaking change

This is what I was talking about on our call today, sorry if that wasn't clear: this will just have to be documented in the changelog. Its a bummer, but we don't have a good way to deprecate params right now.

thermokarst commented 4 years ago

Forgot to mention:

We could change that to something like "cpus" or "cpus-requested"

My concern regarding these options, is that they bring us back to the original problem that came up in the beta-div methods of q2-div-lib: "cpus" aren't necessarily what a user would be requesting or allocating here: threads or n_jobs are the things being requested. n_jobs are either threads or processes, behind the scenes (according to sklearn's joblib docs).