rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.32k stars 887 forks source link

[ENH] Avoid repeated bounds-checking in `take` when arguments are known in bounds #13456

Open wence- opened 1 year ago

wence- commented 1 year ago

As noted in #13419, there are likely places where we call take on a column with a gather map that is known to be in-bounds. We should therefore avoid an (unnecessary) bounds-check in these cases where we know this by passing check_bounds=False.

_Originally posted by @bdice in https://github.com/rapidsai/cudf/pull/13419#discussion_r1203131049_

### Tasks
- [ ] parquet.py `_get_groups_and_offsets`
- [ ] `_DataFrameLocIndexer._getitem_tuple_arg`
- [ ] `CategoricalColumn._get_decategorized_column`
- [ ] `ColumnBase.slice`
- [ ] `Rangindex._gather` ?
- [ ] `Groupby.agg`
- [ ] timezones.py `utc_to_local` and `local_to_utc`
- [ ] `MultiIndex.__repr__`
bdice commented 1 year ago

The better change in many places might be to eliminate “take” when used in an “argsort and gather” in favor of a sort-by-key. https://github.com/rapidsai/cudf/pull/13419#discussion_r1203866268