qiime2 / q2-diversity

BSD 3-Clause "New" or "Revised" License
4 stars 45 forks source link

more informative error message for name clashes #279

Closed sjanssen2 closed 4 years ago

sjanssen2 commented 4 years ago

Current Behavior When running qiime diversity alpha-rarefaction with a metadata file that contains a column names depth, the current plugin (v 2020.2) fails with a hard to decode error message:

  File "/home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2cli/commands.py", line 328, in __call__
    results = action(**arguments)
  File "</home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/decorator.py:decorator-gen-420>", line 2, in alpha_rarefaction
  File "/home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    output_types, provenance)
  File "/home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/qiime2/sdk/action.py", line 452, in _callable_executor_
    ret_val = self._callable(output_dir=temp_dir, **view_args)
  File "/home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_diversity/_alpha/_visualizer.py", line 381, in alpha_rarefaction
    c_df = _compute_summary(reindexed_df, column, counts=counts)
  File "/home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/q2_diversity/_alpha/_visualizer.py", line 269, in _compute_summary
    summary_df = summary_df.reset_index()
  File "/home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/frame.py", line 4709, in reset_index
    new_obj.insert(0, name, level_values)
  File "/home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/frame.py", line 3591, in insert
    self._data.insert(loc, column, value, allow_duplicates=allow_duplicates)
  File "/home/sjanssen/miniconda3/envs/qiime2-2020.2/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1173, in insert
    raise ValueError("cannot insert {}, already exists".format(item))
ValueError: cannot insert depth, already exists

Improvement Description I think it would be helpful for the average user to raise a more elaborate error message and point to the clashing column name or use some escaping mechanism for the internal computation

thermokarst commented 4 years ago

@Oddant1 - if you need something to work on, this would be a great little bugfix. I think we should allow users to provide a column in their MD called depth - we should be able to clean up the internal implementation to prevent the name clashing. If you wind up working on this please assign to yourself and put it on the 2020.8 project board. Thanks!

Oddant1 commented 4 years ago

@thermokarst how do we want to handle this name clash? And do any downstream utilities use the depth column simply by looking for a column called depth? If not maybe we can do something like "if the metadata already contains a depth column we add DEPTH if it doesn't already contain a depth column we add depth if it contains DEPTH and depth already then the user needs to get rid of/rename one"

thermokarst commented 4 years ago

@thermokarst how do we want to handle this name clash?

I was thinking you could just come up with a word that would never be used, like _alpha_rarefaction_depth_column_20mnfs023, or something like that.

And do any downstream utilities use the depth column simply by looking for a column called depth?

No - in general, we don't hardcode any column name requirements. One notable exception are the V2 manifest formats, but that is because first and foremost they are manifest formats, that just happen to match the QIIME 2 Metadata specification.

If not maybe we can do something like...

Interesting idea, thanks for sharing! I don' think this is necessary, though.

Oddant1 commented 4 years ago

Yeah if we can just make some absurd name like the one you proposed because nothing downstream cares then :+1: