Add pooling options to Q2 workflows

benjjneb commented 6 years ago

Improvement Description Add a new option that allows users to pick independent sample processing (as done currently), pooled sample processing, or "pseudo-pooling" that was added in 1.7.5. It probably makes sense to wait until the R package 1.8 release is available (~June) to add this.

The pooling options provide better detection of rare per-sample variants at the cost of increased computation time.

Also consider making pseudo-pooling the default processing mode.

References "pseudo-pooling" that was added in 1.7.5

nbokulich commented 5 years ago

forum xref

benjjneb commented 5 years ago

Question: Can default parameter choices be dependent on other parameter choices?

The reason I ask: Pooled chimera removal is better if pooled sample inference is performed, but the default chimera removal is consensus, which is better for the default sample inference method (independent). So, can chimera removal be defaulted to pooled if the user selects pooled sample inference?

ebolyen commented 5 years ago

Not really. There would be a way to refine the types based on other types passed (should be available next week-ish), but that would categorically prevent mixing the two.

A different approach would be to have the two steps be separate actions, and then in a pipeline which composes them, you have a "simpler" argument which unifies the two arguments. That way, the "default" invocation does the ideal thing for inference and chimera checking, but mixing them is still possible if you run the sub-actions directly.

benjjneb commented 5 years ago

My current idea is to change the default chimera method to "auto", which chooses "consensus" or "pooled" chimera removal depending on the choice made at the sample inference step. Users will still be able to define the chimera removal method themselves in which case that choice will be used.

That achieves my goal here of defaulting to the "right" chimera removal method for each sample inference method, but let me know if that seems a bad idea.

ebolyen commented 5 years ago

That works too! There are a few places where we have similar patterns.

benjjneb commented 5 years ago

Shoot, I didn't have this Q2 release on my calendar and it looks like PRs are due July 22. It would be really nice to get pseudo-pooling in though, as I know there are a decent number of people interested in that feature. I'll see if I can squeeze some time in, but I can't promise anything.

On Wed, Jul 17, 2019 at 6:33 AM yanxianl notifications@github.com wrote:

Hi, will pseudo-pooling and pooling make it to the coming release of QIIME2-2019.7? I'm looking forward to using pseudo-pooling for my dataset within QIIME2, which otherwise has to be done in R instead.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-dada2/issues/87?email_source=notifications&email_token=ABMHKVF3W2NNGHXLM5XMICLP73YPNA5CNFSM4EPAFB6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DYTGA#issuecomment-512199064, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMHKVAN6PGZY4C34VKZVU3P73YPNANCNFSM4EPAFB6A .

thermokarst commented 5 years ago

Thanks @benjjneb --- we can try and coordinate efforts, too --- if you want to pass things off in a semi-usable state one of us can probably run it across the finish line.

benjjneb commented 4 years ago

The R code for pseudo-pooling in the Q2 plugin is working on my end in #122

qiime2 / q2-dada2

Add pooling options to Q2 workflows #87