Closed rishi-kulkarni closed 6 months ago
What sort of tests would you want to see here? Given the very pluggable nature of
formulaic
here, this feature ended up being very little code.
Aye... I'd be content with 100% coverage here. There are pretty few moving parts, and so long as every line is tested, and we also add a test for series of non-string types, I'm happy with this as is.
@matthewwardrop Thanks for the comments, they should now be addressed. I went with casting to numpy.str_
since that seemed to have some performance benefits for very large numbers of rows.
I made one tiny further change to the docstring that I left in the review, but somehow it didn't propagate through. I'll merge this in now. Thank you so much for your contribution!
Hmm, looks like the usedforsecurity
flag wasn't added until 3.9. I assume something like:
def md5_to_int(s: str) -> int:
# Check the Python version
if sys.version_info >= (3, 9):
return int(md5(s.encode(), usedforsecurity=False).hexdigest(), 16)
else:
return int(md5(s.encode()).hexdigest(), 16)
should work
Forgot to run the tests! Anyway, I've patched these and some other issues on main
. Thanks again for you contribution.
I went with the approach of not making
hashed
a stateful transform and just passing an empty dict toencode_contrasts
. Is there any issue with that approach?What sort of tests would you want to see here? Given the very pluggable nature of
formulaic
here, this feature ended up being very little code.Closes #164