We need an ETL task to explode the contents of a dataframe column. There are two scenarios to consider:
The task params are:
input column: string
column_names: array
Expected Behaviour:
a dataframe column contains json/dict values: In this case can use the DataFrame.json_normalize() function for this. Then the dict keys become new column names and the values are the new column values. We already do this within the ecoscope.io.earthranger._normalize_column() function and maybe this should be moved out of ecoscope.io.earthranger and into ecoscope.io.utils to be more generally available.
a dataframe column contains an array: In this case we should create a new set of DF columns with user supplied names (or val1, val2, val3,...) if there are no user supplied names. The new columns receive the array values.
if the column_names param doesn't match the length of the dict keys (if dict type), or array length (if an array type) then we should give a warning but proceed to name extra columns 'val6, val7' etc. If the supplied name column is shorter than just truncate but provide a warning.
There maybe cases with nested arrays. In that case a user may have to apply the task again to surface the values needed from nested arrays or dicts.
We need an ETL task to explode the contents of a dataframe column. There are two scenarios to consider:
The task params are:
Expected Behaviour: