qiime2 / q2-metadata

BSD 3-Clause "New" or "Revised" License
3 stars 17 forks source link

Metadata Tabulate Problematic with Per Sequence Data #35

Open Oddant1 opened 4 years ago

Oddant1 commented 4 years ago

Bug Description Metadata Tabulate churns for extended periods of time on data with many rows/records e.g. ErrorCorrectionDetails. Resulting .qzv unapproachably long.

Steps to reproduce the behavior

  1. Run qiime metadata tabulate on a significantly sized ErrorCorrectionDetails .qza
  2. Wait for the heat death of the universe
  3. Get nothing

Expected behavior Let the user know what they're about to do has a significant probability of taking a long time, failing, or both. Let the user know that the process failed, how it failed, and why it failed to the best of our ability when it does fail.

Computation Environment See forum x-ref

Questions

  1. What file size is large enough for us to warn them about the probability of failure? Or should we just say "Producing .qzv files from this form of .qza can take a significant amount of time and compute power with a high risk of failure" or something similar regardless of any other factors?
  2. Can we do this without negatively impacting backwards compatibility for users?

References

  1. https://forum.qiime2.org/t/metadata-tabulate-fails-on-demux-error-correction-details-without-error-message/11824/4
  2. https://github.com/qiime2/q2-demux/issues/105