justchokingaround / jerry

watch anime with automatic anilist syncing and other cool stuff
270 stars 20 forks source link

Unescape Unicode characters #49

Closed JustArion closed 6 months ago

JustArion commented 7 months ago

I've noticed that certain shows contain unescaped Unicode characters. Most notably from the screenshot below "Frieren: Beyond Journey\u2019s End" and ""The Misfit of Demon King Academy: History\u2019s Strong..." e886a718-9719-47bc-ba87-c42789878a4d_24-02-2024

Through a broad fix, it looks to be doing fine, though, I haven't tested it with external menus yet such as rofi, but for the most part, fzf works fine. It would most likely need additional testing to see if anything breaks. Here's how the fix looks like

19385cd2-bf54-4d97-a93f-4006752a29b3_24-02-2024

The lines I've changed are basically changing printf "%s" ... to printf "%b" ... eg. L438

- choice=$(printf "%s" "$tmp_anime_list" | launcher "Choose anime: " "1")
+ choice=$(printf "%b" "$tmp_anime_list" | launcher "Choose anime: " "1")

Anime L438 L462 L493 L513

Manga L613 L635 L665 L683

justchokingaround commented 7 months ago

sadly %b isnt posix compliant, since it's a bash extension. iirc if u test that with dash it won't work

JustArion commented 7 months ago

sadly %b isnt posix compliant, since it's a bash extension. iirc if u test that with dash it won't work

Could a sed replacement for any \u[0-9]{4} work as an alternative? (I've barely ever used sed myself)

justchokingaround commented 7 months ago

i dont think it'd be feasible since it'd look smtg like this: https://github.com/justchokingaround/ln-cli/blob/main/html-decode.sed

JustArion commented 7 months ago

i dont think it'd be feasible since it'd look smtg like this: https://github.com/justchokingaround/ln-cli/blob/main/html-decode.sed

That's unfortunate

sadly %b isnt posix compliant, since it's a bash extension. iirc if u test that with dash it won't work

Yeah, I just tried it on a desktop environment and it didn't work.

Would you like the issue open for the future maybe or closed as a wont fix tag?

justchokingaround commented 7 months ago

i prefer leaving it open, ill fix it eventually

justchokingaround commented 7 months ago

how does:

printf '%s' "$1" | sed -E 's|\\u.{4}||g'

sound @JustArion ?

it's not ideal, but ig it's enough of a workaround

JustArion commented 7 months ago

how does:

printf '%s' "$1" | sed -E 's|\\u.{4}||g'

sound @JustArion ?

it's not ideal, but ig it's enough of a workaround

It looks like a good workaround for the time being, rather no unescaped unicode characters than some for the time being.

justchokingaround commented 6 months ago

9238e6e3e93305ac4db68d9dc897a5ef105d55f4