ucsd-ccbb / qiimp

Web application to collect metadata specifications from an experimenter and produce metadata input files with appropriate constraints
3 stars 7 forks source link

Support QIIME2 #q2:types directive row in metadata to prevent casting categorical values to numeric #120

Closed AmandaBirmingham closed 6 years ago

AmandaBirmingham commented 6 years ago

[Note: I'm not sure about the scope of this issue. Hopefully it would be simple--just add another row under the header row--but adding a row there requires rejiggering the huge formulas being used, and thus could be hairier than I hope.]

This issue is due to Jon Sanders' feedback during alpha testing that the metadata wizard should be compatible with the QIIME2 metadata update.

https://docs.qiime2.org/2018.2/tutorials/metadata/ states:

QIIME 2 currently supports categorical and numeric metadata columns. By default, QIIME 2 will attempt to infer the type of each metadata column: if the column consists only of numbers or missing data, the column is inferred to be numeric. Otherwise, if the column contains any non-numeric values, the column is inferred to be categorical. Missing data (i.e. empty cells) are supported in categorical columns as well as numeric columns.

QIIME 2 supports an optional comment directive to allow users to explicitly state a column’s type, avoiding the column type inference described above. This can be useful if there is a column that appears to be numeric, but should actually be treated as categorical metadata (e.g. a Subject column where subjects are labeled 1, 2, 3). Explicitly declaring a column’s type also makes your metadata file more descriptive because the intended column type is included with the metadata, instead of relying on software to infer the type (which isn’t always transparent).

You can use an optional comment directive to declare column types in your metadata file. The comment directive must appear directly below the header. The row’s first cell must be #q2:types to indicate the row is a comment directive. Subsequent cells may contain the values categorical or numeric (both case-insensitive). The empty cell is also supported if you do not wish to assign a type to a column (the type will be inferred in that case). Thus, it is easy to include this comment directive without having to declare types for every column in your metadata.

adswafford commented 6 years ago

Rejiggering formulas sounds bad given the heroic effort it's been to get this to all work in formulas, and I'm not sure that Qiita can handle it if we did.

I think that this will be good to do when we make Qiita more data type aware by reading the yaml since we should be able to parse that to add this row and information as part of the upload into Qiita. DO you agree @tanaes?

tanaes commented 6 years ago

hmm I'm trying to fully parse this issue. We are already encoding the data types somehow in the web interface, aren't we? So the reason this would require poking around in the Excel sheets is so the tables exported after entering metadata in Excel would be directly Q2-compatible?

I wouldn't consider Qiita compatibility to be a dealbreaker -- that would be a single and simple update to an importer (to enable it to ignore the value-type row anyway).

AmandaBirmingham commented 6 years ago

@tanaes Yes, the info on the datatype is already in the wizard-produced xlsx file, on the schema worksheet, so the only reason for adding the #q2:types directive row on the metadata worksheet would be to make that worksheet itself could be directly exported to produce sensible QIIME2 input.

If we are willing to require some sort of intermediate processing to produce QIIME2 input from the wizard xlsx, then the need to support these directives in the wizard xlsx itself goes away--using the info embedded in the wizard xlsx, any piece of code could easily generate the #q2:types row.

adswafford commented 6 years ago

For the next phase, let's deprioritize this and then re-examine once we figure out how this will live in Qiita. I think the adding of the row for Qiita would be a part of whatever we use to merge multiple Excel files together which is still TBD.

adswafford commented 6 years ago

I'm going to push on the Qiita front to have the Qiime2 row be acceptable in time for the Qiime paper submission which would bump up the priority of some of the metadata refactoring efforts in Qiita so milestoning this for v1.0 (our publication submission)

adswafford commented 6 years ago

Thinking more about the complexity of handling this and the number and type of users, I think this should be solved in the creation of the "Qiime mapping file" available for download in Qiita so that we promote/encourage the flow of:

  1. Qiimp
  2. Excel
  3. Qiita
  4. Qiime2

This would require Qiita to then parse the metadata schema when the .xslx is uploaded and then just label the continuous columns as continuous and the rest as categorical (for Qiime2's purposes). Therefore I think this should not be handled by Qiimp but instead grouped with the metadata refactor for Qiita which I'd like to push for in Q1 2019.

Reopen if there is disagreement, and flagging @antgonza

antgonza commented 6 years ago

This needs more discussion cause, in reality in Qiime2 we don't need a "classic" Qiime mapping file so we shouldn't need to merge the prep/sample ...