wildlife-dynamics / ecoscope-workflows

An extensible task specification and compiler for local and distributed workflows.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Task: ETL - Categorical Value Classification #35

Open walljcg opened 1 week ago

walljcg commented 1 week ago

We need an ETL task that allows a user to classify categorical values on a dataframe. If there are as many unique classes as unique categorical values then this is a simple mapping of one value to another. Otherwise if the classes are less then there needs to be some logic that specifies the mapping. E.g., if x='navy blue' or x='light blue': y='blue'.

This function should work for either numeric or text data types.

Parameter inputs will be:

  1. Input column name
  2. a dict that provides the mapping of the values of the column to the new values. If input values do not appear in the remapping dict then then are left untouched.
  3. Output column name (if it's the same as the input name then the input column is overwritten with the re-mapped values).