Open cisaacstern opened 5 days ago
This PR follows the suggestions of ChatGPT (what a world we live in) which I thought were quite reasonable, as pasted in https://github.com/wildlife-dynamics/ecoscope-workflows/issues/39#issuecomment-2189954068, to use pd.DataFrame.assign
in a limited way for addition of new columns based on the attributes of other columns in the dataframe.
Opening this now because I believe I will need it to address #28 / #45 ... to explain why: in #45, I want to group a dataset by month that does not already have a "month" column. AFAICT, the existing transformations we have do not support adding the "month" column based on an existing column of pd.Timestamp
values. This small PR adds that ability.
Thanks Charles!
Have you seen the temporal and period indexers we have here?
My assumption has been that we use these functions to create the indexes and then use DataFrame.GroupBy([groupers]) to create a Grouped object which is then passed into the split-apply_combine?
A-ha, I had not been aware of those utils.
So using the linked indexers in ecoscope core, how would I group this dataframe by month?
import pandas as pd
df = pd.DataFrame(
{
"recorded_at": [
pd.Timestamp("2021-01"),
pd.Timestamp("2021-01"),
pd.Timestamp("2021-02"),
pd.Timestamp("2021-02"),
pd.Timestamp("2021-03"),
],
"value": [5, 6, 7, 8, 9],
}
)
with this PR, it would be:
from ecoscope_workflows.tasks.transformation import assign_from_column_attribute
df_new = assign_from_column_attribute(
df, column_name="month", dotted_attribute_name="recorded_at.dt.month"
)
df_new.groupby("month")
Towards #39