qiime2 / q2-composition

BSD 3-Clause "New" or "Revised" License
5 stars 27 forks source link

BUG: dynamic error for reference_level column::value based on too many vs too few separators #120

Closed Oddant1 closed 1 year ago

Oddant1 commented 1 year ago

Closes #119 by making the error message dynamic based on number of separators

Oddant1 commented 1 year ago

We floated the idea of what happens if their metadata actually contains ::tongue. Then they would pass bodysite::::tongue and it would spit out ('bodysite', '', 'tongue') which is not good. This makes it seem like we need to disallow :: in metadata.

The idea was floated of using .split('::', maxsplit=1), then we would get ('bodysite', '::tongue'), but this raises another issue. If the value in their metadata was actually bodysite:: and not ::tongue they would still get ('bodysite', '::tongue') when they would actually want ('bodysite::', 'tongue'). It seems like we just need to disallow occurences of more than one : in a row in metadata.

gregcaporaso commented 1 year ago

@Oddant1, it is also ok for us just to fail with a descriptive error message. We've purposely tried to avoid putting restrictions on what is actually allowed in metadata, in favor of these recommendations. If a user is following the recommendations, none of these issues would come up, so we can always point them to the recommendations if one of these cases actually comes up in the wild.

So I guess my vote would be to go with the .split('::', maxsplit=1), and then provide a meaningful error message if we don't find either the column headers or the values. In that case you can note that colons in column headers or values can cause problems with this action.

lizgehret commented 1 year ago

@Oddant1 @gregcaporaso I just chatted with @ebolyen about this, and this would actually be a good place to utilize Collections (i.e. input would instead look like a dict of key:value pairs instead of having to list each column::value pair separately). In any case, this would be a breaking change so it's probably outside of the scope of this PR. I'll open a separate issue on this because this would probably the best actual solution.

EDIT: issue added here.