ndd7xv / heh

A terminal UI to edit bytes by the nibble.
MIT License
435 stars 18 forks source link

Unicode Support #36

Closed 0b11001111 closed 1 year ago

0b11001111 commented 1 year ago

This PR explores potential Unicode support, see discussion at https://github.com/ndd7xv/heh/issues/5

ndd7xv commented 1 year ago

Thanks so much for the PR! I've been pretty busy with work recently but I'll try and look at this sometime this weekend :) This is honestly some really good stuff :smile:

Considering the problems mentioned in https://github.com/ndd7xv/heh/issues/5#issuecomment-1296772103, I think we could put this behind some sort of feature toggle, like running heh -u/heh --unicode (which would indicate representing bytes with UTF-8 for now) so users would have the option to view bytes as UTF-8 unicode. As a result, we can just mark the work here as "in development" as its being worked on.

Just glimpsed at your PR and I have to say it's really neat! I'll definitely try and get this merged when I have the time (might ask questions/request mild changes + documentation, but if you're too busy I'll get around to it sooner or later).

0b11001111 commented 1 year ago

That's great news! I've rewritten the whole decoder though and it should be much cleaner now. The architecture of my code allows for simple extension (looking at you utf-16) and supports runtime configuration. However, the code still needs some better test and refactoring as it is partly redundant to bytes.rs. Also, as of now the colouring is chosen under the assumption that all is ASCII...

0b11001111 commented 1 year ago

... and escaping is missing for control, whitespace and other non displayable characters.

0b11001111 commented 1 year ago

I made a few more changes and now it almost works as I wish. Also, it should be trivial to integrate other Unicode encodings.

A few quirks I observed:

Personally, I think solving these issues is far beyond the scope of this little feature and exceeds my Unicode knowledge anyway. Since all this is optional to the user, I'd keep it as is

Demo Time

heh demo.md grafik

heh --encoding utf8 demo.md grafik

ndd7xv commented 1 year ago

Hello! I'm glancing over your code right now and may make a couple of tweaks, but it looks promising (and very nice demo :D)! There is one thing I'm looking to change - If I were to use heh on a file with the following:

Examples of emoji are 😂, 😃, 🧘🏻‍♂️, 🌍, 🌦️, 🍞, 🚗, 📞, 🎉, ❤️, 🍆, 🍑 and 🏁.

regardless of encoding option displays image

while the current version of heh displays image

You'll notice that some bytes aren't showing up (A7 98 F0 in the second row). I'm gonna look into fixing this but I think after that I'll squash everything and merge.

ndd7xv commented 1 year ago

Sorry for leaving you hanging! I think this is great and I'll merge it now; I'll get around to creating issues for smaller things/concerns (e.g. better escape characters/colors as you mentioned) as I find the need to. I should probably get around to testing and all that other good stuff too..

Thank you again for your contribution, it genuinely means a lot to me :smile: I'm looking to publish another release by the end of the year, but let me know if you'd want me to try to do something earlier and I'll try and fit in what I want by then :slightly_smiling_face:

0b11001111 commented 1 year ago

Cool cool :) By the end of the year I'll have more spare time and may jump in again if I feel like doing some Rust hacking... Let's see :)