pyinvoke / invoke

Pythonic task management & command execution.
http://pyinvoke.org
BSD 2-Clause "Simplified" License
4.31k stars 365 forks source link

Add stdin_bytes to skip encoding #915

Open Dreamsorcerer opened 1 year ago

Dreamsorcerer commented 1 year ago

Fixes #818.

Dreamsorcerer commented 1 year ago

@bitprophet Not sure what you pushed, but seems to have added a bunch of unrelated changes to the diff.

Dreamsorcerer commented 1 year ago

I should still have the original branch, if you just want me to force push it back?

Dreamsorcerer commented 1 year ago

@bitprophet ?

bitprophet commented 1 year ago

Yea looks like the typing updates made github lose its mind. if you can rebase and force-push that'd probably be ideal - you'll probably want to make sure you copy over any type hints from main to your branch's diff, if there even are any.


From skimming what looks like the two commits actually relevant: my off the cuff thoughts:

Dreamsorcerer commented 1 year ago

Ah, it looks like I made the changes through Github, not locally. I'll try cherry picking to a new branch.

Dreamsorcerer commented 1 year ago
* however I'm wondering if there's anything cleaner/smarter we can do within the "number of bytes to read" helper instead (per your original 'fix' for this elsewhere)

That's a different issue, so not sure that function has anything to do with this one.

* or, since the "it's binary, please do not do any edge encoding/decoding" approach is arguably more correct, is there anything smarter we can do to autodetect?

  * are there approaches other tools use when handling unknown stdin?
  * eg "attempt to decode some reasonable first few bytes as if it was $configured_encoding, and if this fails, fallback to assuming binary"?

To be honest, I never fully understood the reason for decoding to a str and back again. So, you'll probably just have to make a decision on this one.

The only related thing I can think of is charset-normalizer, which can guess the encoding used. I'm still not sure I'd use it to try and detect if something is binary or not though. It's always going to be an estimate, so you'll likely still have edge cases that cause problems on occasion.

Dreamsorcerer commented 1 year ago

Let me know if you want to continue this way, and I'll sort out the typing. It'll be a little tricky now, as the data being passed around is str | bytes and we're detecting it based on an option, rather than an isinstance() check.