Open Dreamsorcerer opened 1 year ago
@bitprophet Not sure what you pushed, but seems to have added a bunch of unrelated changes to the diff.
I should still have the original branch, if you just want me to force push it back?
@bitprophet ?
Yea looks like the typing updates made github lose its mind. if you can rebase and force-push that'd probably be ideal - you'll probably want to make sure you copy over any type hints from main
to your branch's diff, if there even are any.
From skimming what looks like the two commits actually relevant: my off the cuff thoughts:
stdin_is_bytes=True
or maybe decode_stdin=False
Ah, it looks like I made the changes through Github, not locally. I'll try cherry picking to a new branch.
* however I'm wondering if there's anything cleaner/smarter we can do within the "number of bytes to read" helper instead (per your original 'fix' for this elsewhere)
That's a different issue, so not sure that function has anything to do with this one.
* or, since the "it's binary, please do not do any edge encoding/decoding" approach is arguably more correct, is there anything smarter we can do to autodetect? * are there approaches other tools use when handling unknown stdin? * eg "attempt to decode some reasonable first few bytes as if it was $configured_encoding, and if this fails, fallback to assuming binary"?
To be honest, I never fully understood the reason for decoding to a str and back again. So, you'll probably just have to make a decision on this one.
The only related thing I can think of is charset-normalizer, which can guess the encoding used. I'm still not sure I'd use it to try and detect if something is binary or not though. It's always going to be an estimate, so you'll likely still have edge cases that cause problems on occasion.
Let me know if you want to continue this way, and I'll sort out the typing. It'll be a little tricky now, as the data being passed around is str | bytes
and we're detecting it based on an option, rather than an isinstance() check.
Fixes #818.