Open rtbs-dev opened 11 months ago
Hi!
I have these version installed:
awkward 2.3.2
awkward_pandas 2023.8.0
numpy 1.23.5
pandas 1.5.2
And I'm unable to reproduce the error you're seeing (the example in docs is running for me with those versions). Would you be able to spin up a fresh conda/virtual environment with this versions and try again?
For completeness here's what I see locally:
In [20]: data = """
...: - name: Bob\n team: tigers\n goals: [0, 0, 0, 1, 2, 0, 1]\n\n- name: Alice\n team: bears\n goals: [3, 2, 1, 0, 1]\n\n- name: Jack\n team: bears\n goals: [0, 0, 0, 0,
...: 0, 0, 0, 0, 1]\n\n- name: Jill\n team: bears\n goals: [3, 0, 2]\n\n- name: Ted\n team: tigers\n goals: [0, 0, 0, 0, 0]\n\n- name: Ellen\n team: tigers\n goals: [1,
...: 0, 0, 0, 2, 0, 1]\n\n- name: Dan\n team: bears\n goals: [0, 0, 3, 1, 0, 2, 0, 0]\n\n- name: Brad\n team: bears\n goals: [0, 0, 4, 0, 0, 1]\n\n- name: Nancy\n team: ti
...: gers\n goals: [0, 0, 1, 1, 1, 1, 0]\n\n- name: Lance\n team: bears\n goals: [1, 1, 1, 1, 1]\n\n- name: Sara\n team: tigers\n goals: [0, 1, 0, 2, 0, 3]\n\n- name: Ryan
...: \n team: tigers\n goals: [1, 2, 3, 0, 0, 0, 0]\n
...: """
In [21]: import yaml
...:
...: data = yaml.load(data, Loader=yaml.SafeLoader)
...: data = ak.Array(data)
In [22]: s = akpd.from_awkward(data)
In [23]: df = s.ak.to_columns(extract_all=True)
In [24]: (df
...: .set_index('name')
...: .groupby('team', group_keys=True)
...: .apply(lambda x: x.goals.ak.mean(axis=1))
...: )
Out[24]:
team name
bears Alice 1.4
Jack 0.111111
Jill 1.666667
Dan 0.75
Brad 0.833333
Lance 1.0
tigers Bob 0.571429
Ted 0.0
Ellen 0.571429
Nancy 0.571429
Sara 1.0
Ryan 0.857143
dtype: awkward
In [25]: (df
...: .set_index('name')
...: .groupby(['team', 'name'], group_keys=True)
...: .apply(lambda x: x.goals.ak.mean(axis=1))
...: )
Out[32]:
team name name
bears Alice Alice 1.4
Brad Brad 0.833333
Dan Dan 0.75
Jack Jack 0.111111
Jill Jill 1.666667
Lance Lance 1.0
tigers Bob Bob 0.571429
Ellen Ellen 0.571429
Nancy Nancy 0.571429
Ryan Ryan 0.857143
Sara Sara 1.0
Ted Ted 0.0
dtype: awkward
I'm also unable to reproduce this:
I should mention that the behavior of s.ak.to_columns() appears to have changed as well, since my version returns only a single column named awkward-data, vs. the docs that have a column for every field in the array.
In [18]: s.ak.to_columns()
Out[18]:
name team awkward-data
0 Bob tigers {'goals': [0, 0, 0, 1, 2, 0, 1]}
1 Alice bears {'goals': [3, 2, 1, 0, 1]}
2 Jack bears {'goals': [0, 0, 0, 0, 0, 0, 0, 0, 1]}
3 Jill bears {'goals': [3, 0, 2]}
4 Ted tigers {'goals': [0, 0, 0, 0, 0]}
5 Ellen tigers {'goals': [1, 0, 0, 0, 2, 0, 1]}
6 Dan bears {'goals': [0, 0, 3, 1, 0, 2, 0, 0]}
7 Brad bears {'goals': [0, 0, 4, 0, 0, 1]}
8 Nancy tigers {'goals': [0, 0, 1, 1, 1, 1, 0]}
9 Lance bears {'goals': [1, 1, 1, 1, 1]}
10 Sara tigers {'goals': [0, 1, 0, 2, 0, 3]}
11 Ryan tigers {'goals': [1, 2, 3, 0, 0, 0, 0]}
In [19]: s.ak.to_columns(extract_all=True)
Out[19]:
name team goals
0 Bob tigers [0, 0, 0, 1, 2, 0, 1]
1 Alice bears [3, 2, 1, 0, 1]
2 Jack bears [0, 0, 0, 0, 0, 0, 0, 0, 1]
3 Jill bears [3, 0, 2]
4 Ted tigers [0, 0, 0, 0, 0]
5 Ellen tigers [1, 0, 0, 0, 2, 0, 1]
6 Dan bears [0, 0, 3, 1, 0, 2, 0, 0]
7 Brad bears [0, 0, 4, 0, 0, 1]
8 Nancy tigers [0, 0, 1, 1, 1, 1, 0]
9 Lance bears [1, 1, 1, 1, 1]
10 Sara tigers [0, 1, 0, 2, 0, 3]
11 Ryan tigers [1, 2, 3, 0, 0, 0, 0]
So I downloaded the exact notebook for your "quickstart", and I started a new environment with defaults via conda, and used pip install awkward awkward-pandas ipykernel pyyaml
(with a subsequent python -m ipykernel install --user --name awkward
to access the kernel).
Here's the versions that gets:
awkward 2.3.2
awkward_pandas 2023.8.0
numpy 1.25.2
pandas 2.0.3
And interestingly the groupby now works, but I do reproduce the to_columns
error perfectly:
s.ak.to_columns()
gives
awkward-data
0 {'name': 'Bob', 'team': 'tigers', 'goals': [0,...
1 {'name': 'Alice', 'team': 'bears', 'goals': [3...
2 {'name': 'Jack', 'team': 'bears', 'goals': [0,...
3 {'name': 'Jill', 'team': 'bears', 'goals': [3,...
4 {'name': 'Ted', 'team': 'tigers', 'goals': [0,...
5 {'name': 'Ellen', 'team': 'tigers', 'goals': [...
6 {'name': 'Dan', 'team': 'bears', 'goals': [0, ...
7 {'name': 'Brad', 'team': 'bears', 'goals': [0,...
8 {'name': 'Nancy', 'team': 'tigers', 'goals': [...
9 {'name': 'Lance', 'team': 'bears', 'goals': [1...
10 {'name': 'Sara', 'team': 'tigers', 'goals': [0...
11 {'name': 'Ryan', 'team': 'tigers', 'goals': [1...
I'll have to go now but I can try to reproduce the main error with older pandas later today, hopefully.
Hi all! Lovely utility here. I was playing with the example from the docs and can't quite seem to find a good workaround for this bug:
This seems to be happening with the
.agg
operator as well, and the.groupby(['team','name']).apply(...)
method I would usually use returns an error complaining about no attribute'any'
.Here's my version info, as in the docs:
I should mention that the behavior of
s.ak.to_columns()
appears to have changed as well, since my version returns only a single column namedawkward-data
, vs. the docs that have a column for every field in the array.