[ENH] Audit `argsort + gather/scatter` patterns for missing performance

During review of #13419 we noted a few places where there is a pattern like:

some_frame = ...
indices = some_frame.some_column.argsort()
new_frame = some_frame.take(indices)

As well as the unnecessary bounds-check (see #13456), this is a pattern that is captured by libcudf's sort_by_key and stable_sort_by_key functions (we would want to use the latter in pandas-compat mode).

At present, libcudf implements this as a argsort of the key columns followed by a gather. But that's an implementation detail (there may in the future be updates to that implementation). In the Python layer we should "say what we mean" and call into the appropriate libcudf API.

A cursory search shows:

/usr/bin/rg -U -e argsort.\*'     
'.\*take

core/_base_index.py
1305:        indices = self.argsort(ascending=ascending, na_position=na_position)
1306:        index_sorted = self.take(indices)

core/indexed_frame.py
2464:                # double-argsort to map back from sorted to unsorted positions
2465:                df = df.take(index.argsort(ascending=True).argsort())

core/column/column.py
1382:        order = order.take(left_gather_map, check_bounds=False).argsort()
1383:        codes = codes.take(order)

core/groupby/groupby.py
686:            gather_map = ordering.take(to_take).argsort()
687:            return result.take(gather_map)

Of these, the calls in _base_index.py, _column.py, and groupby.py can definitely be replaced by sort_by_key. Note also that none of these calls pass check_bounds=False to take so incur an unnecessary kernel launch to check in-boundsness for something that is guaranteed in bounds.

The take(argsort().argsort()) pattern is not a sort_by_key, however, we can elide one of the argsorts by noticing that take is a gather operation and for a permutation, the dual to gather is scatter. So this should be implemented as df.scatter(index.argsort()) instead...

These are just the cases where an argsort is immediately followed by a take, probably more diligent searching would find more.

### Tasks
- [ ] `RollingGroupby.__init__`
- [ ] `Groupby._head_tail`
- [ ] `IndexedFrame.interpolate`
- [ ] `IndexedFrame.sort_index`
- [ ] `IndexedFrame._reindex`
- [ ] `IndexedFrame.sort_values`
- [ ] `IndexedFrame._n_largest_or_smallest`
- [ ] `BaseIndex.sort_values`
- [ ] `MultiIndex.__repr__`

rapidsai / cudf

[ENH] Audit `argsort + gather/scatter` patterns for missing performance #13557