rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.37k stars 894 forks source link

[FEA] Improve libcudf hashing tests #7700

Open jlowe opened 3 years ago

jlowe commented 3 years ago

Is your feature request related to a problem? Please describe. Currently the hashing tests use a relatively small number of fixed values and some tests should be parameterized to test all possible input types.

Describe the solution you'd like Ideally the tests should use procedurally generated input with the ability to generate a controllable percentage of nulls and corner case values (e.g.: min/max values, zero, -0.0/NaN,+/-Inf for floating point types, etc.). The results should then be compared against a reference CPU implementation that computes on the same input.

There should also be negative tests that verify any unsupported input types throw appropriate exceptions.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

jlowe commented 3 years ago

Would still like to see this, but it is not high priority.

vyasr commented 2 years ago

@bdice given all your ongoing work involving hashing (including testing like #11145 but also new hash functions that require tests like #9215) would you be open to taking this on? The key is finding suitable reference implementations, which may not be the same for all hash functions, but otherwise the logic for the tests seems like they would be identical (up to maybe some edge cases that we expect some hashes to support but not others).