qiime2 / q2-feature-table

QIIME 2 plugin supporting operations on feature tables.
BSD 3-Clause "New" or "Revised" License
2 stars 37 forks source link

add option for seeding rarefaction? #55

Open gregcaporaso opened 7 years ago

gregcaporaso commented 7 years ago

Improvement Description A forum user suggested that we add support for seeding rarefaction, which is an interesting idea for supporting reproducibility, though I'm not certain what the specific use cases would be.

Questions Are there times where we would want to perfectly replicate rarefaction results? If so, we'd need the seed to be logged into the artifact's provenance.

References suggested

lkursell commented 7 years ago

I really like this idea - if you are doing more global analysis like PCoA plots, I don't think the results would be affected all that much. However, in recent work I've been doing on machine learning with feature tables, I've seen that correlations and p-values can be significantly affected just be re-running rarefaction, especially if the community is particularly diverse.

I also think this would allow a more definitive assessment of if rarefaction was making a difference. Aka seed the tables two different ways, run your analysis, and then compare across the tables what was different.

Perhaps more importantly this means that you could provide a user or collaborator with the raw table and get to exactly the same rarefied table, without having to send along intermediate files. This seems helpful in the context of when a database is being used, like making sure if you pulled out studies from QIITA and rarified them, that you'd always get the same table.

sejsong commented 5 years ago

Tacking on to what @lkursell mentioned, I've found that ancom results can also differ depending on rarefaction iteration. More and more journals are asking for analysis notebooks with manuscript submissions, and I think this is important for exact reproducibility of results by others who may run the code.

nbokulich commented 1 year ago

It looks like this is now possible, as setting a random seed has been enabled in biom-format Table.subsample. See: https://github.com/biocore/biom-format/pull/916

@wasade would you by any chance be interested in exposing that option in q2-feature-table? Or could you let us know when the next release of biom-format is planned so that we can coordinate this issue?

wasade commented 1 year ago

Hey @nbokulich, the next release will happen as soon as I can get enough time to make it happen. I had actually intended to release a week or two ago, but it keeps getting bumped. It's relatively high on my priorities but just not yet at the top. Is this time sensitive for q2-feature-table?

nbokulich commented 1 year ago

thanks @wasade ! The next release of QIIME 2 is in May (PRs must be merged by May 5) so we could add this feature to q2-feature-table in that release if you cut the new release of biom-format before then. So there's opportunity but not urgency I'd say.

wasade commented 1 year ago

Great, thank you, that’s helpful to know.

On Apr 25, 2023, at 10:17 PM, Nicholas Bokulich @.***> wrote:

thanks @wasade https://github.com/wasade ! The next release of QIIME 2 is in May (PRs must be merged by May 5) so we could add this feature to q2-feature-table in that release if you cut the new release of biom-format before then. So there's opportunity but not urgency I'd say.

— Reply to this email directly, view it on GitHub https://github.com/qiime2/q2-feature-table/issues/55#issuecomment-1522799778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADTZMQAVYDG2LHTFOJXCFLXDCVXXANCNFSM4CVUICGA. You are receiving this because you were mentioned.