rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.24k stars 883 forks source link

[FEA] Series and DataFrame idxmax and idxmin #9602

Open beckernick opened 2 years ago

beckernick commented 2 years ago

For API compatibility, cuDF should support idxmax and idxmin on Series and DataFrames.

Note that argmin and argmax (https://github.com/rapidsai/cudf/issues/9601) are slightly different from idxmin and idxmax. The latter methods return the index label associated with the first occurrence of the minimum or maximum value. The index label may not be an integer (see example below).

On DataFrmes, these methods return a Series in which each row corresponds to one column in the original dataFrame.

import pandas as pd
​
df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
                   'co2_emissions': [37.2, 19.66, 1712]},
                   index=['Pork', 'Wheat Products', 'Beef'])
print(df, "\n")
print(df.idxmax(), "\n")
print(df.idxmin())
                consumption  co2_emissions
Pork                  10.51          37.20
Wheat Products       103.11          19.66
Beef                  55.48        1712.00 

consumption      Wheat Products
co2_emissions              Beef
dtype: object 

consumption                Pork
co2_emissions    Wheat Products
dtype: object

import pandas as pd
​
df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
                   'co2_emissions': [37.2, 19.66, 1712]})
print(df, "\n")
print(df.idxmax(), "\n")
print(df.idxmin())
   consumption  co2_emissions
0        10.51          37.20
1       103.11          19.66
2        55.48        1712.00 

consumption      1
co2_emissions    2
dtype: int64 

consumption      0
co2_emissions    1
dtype: int64
github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

espackman-nv commented 8 months ago

Hi all, I just attempted to use idxmin with my CUDF dataframe which resulted in an AttributeError: DataFrame object has no attribute idxmin. I was able to reproduce the error using the above Argentinian consumption DF. Any idea on work arounds?