Open wence- opened 5 months ago
+1. Also there's a ruff rule that would allow us to enforce this https://docs.astral.sh/ruff/rules/zip-without-explicit-strict/
https://github.com/rapidsai/cudf/pull/16637 partially addressed this. There remains more work to do since that PR only changed this in the cudf-polars package, whereas we still need to make this change throughout the rest of the repo.
In many places in the cudf code we zip two (or more) iterables together with the assumption/precondition that they are all of equal length. The pattern is (approximately):
This has, by design of zip, a potential problem lurking in that if the two inputs are not of equal length, we only keep the first
min(len(names), len(columns))
objects.To avoid check this at user-input boundaries we need to be careful to check (and then raise if not) that the inputs we are zipping do have equal length. This is easy to forget.
Python 3.10 introduces a new keyword-argument to
zip
,strict
, which can be used to assert that the inputs have the same length. We should consider using this across the code-base when no longer supporting Python 3.9.