qiime2 / q2-metadata

BSD 3-Clause "New" or "Revised" License
3 stars 17 forks source link

add method to "bin" continuous metadata and generate a new metadata column #8

Open nbokulich opened 7 years ago

nbokulich commented 7 years ago

Proposed Behavior The question is: how to bin? The user could define:

  1. an explicit number of bins to create, and the range of values are sliced at even intervals
  2. a "step" size to explicitly define bin range instead of number of bins. To follow the examples above, 1) if the unit is 1 day, a step of 30 would be roughly 1 month; 2) if the unit is 1 meter, 100 would be 100 meters.
  3. A list of bin cutoffs. E.g., [100, 1000, 10000] would generate 3 bins: all samples with x < 100, 100 ≤ x < 10000, and x ≥ 10000. This would be useful for explicitly defining uneven bin sizes. A tangible example of where this would be used is if samples were collected from patients at many different ages, and an investigator wants to compare the microbiome at [3, 12, 24, 72, 144] months of age.
  4. A very cool "some day in the future" enhancement would be to add a function for auto-binning, by looking at the distribution and finding sensible divisions for creating bins.

This method should require a user-defined name to give the new column.

Comments

  1. This would be useful for using continuous metadata column as pseudo-categorical groupings when performing statistical tests.
  2. For example, a researcher might collect samples from infants at different days of life, and choose to bin those samples into months of life to aggregate into larger groups for statistical comparison. Or collect soil samples at different elevations (meter) and put into 100 m bins for comparison. I could make many other examples.
  3. You are probably asking: "why don't users just create these categories manually from the start"? Sometimes this is not easy to do, and sometimes this will come up only later during analysis.
jairideout commented 7 years ago

This is an interesting idea and definitely worth exploring (though honestly not a high priority for us in 2017 unless someone wants to implement it!). Right now QIIME 2 can't output/create metadata files so we'd need to add support for that in the framework. It sounds like there's a few cases where allowing QIIME 2 to write out metadata would be useful.