perf: improve `ArrowGroupBy.__iter__` performances

What type of PR is this? (check all applicable)

[ ] 💾 Refactor
[ ] ✨ Feature
[ ] 🐛 Bug Fix
[x] 🔧 Optimization
[ ] 📝 Documentation
[ ] ✅ Test
[ ] 🐳 Other

Checklist

[x] Code follows style guide (ruff)
[ ] Tests added
[ ] Documented the changes

If you have comments or can explain your changes, please do so below.

According to Marco's performance benchmarking for plotly, the bottleneck for a few functions seems to be the call we do to ArrowGroupBy.__iter__.

Since pyarrow does not natively support iterating over groups, we (actually pointing finger to myself) implemented a (let's say naive) way of still allowing for that - I remember the use case was for scikit-lego to fully support arrow as well.

This PR tries to improve those performances using native arrow methods and no simple shortcuts. Steps are as follow:

Create an array containing the string concatenation of the key values (after casting to string). Null handling is required.
Add the column to the original table
Return the pair of :
- key values, obtained as first (and unique) value of filtered table for the key names.
- sliced dataframe, obtained as filtered table, and dropping the temporary column with string concatenation.

narwhals-dev / narwhals