rrthomas / recode

Charset converter tool and library
GNU General Public License v3.0
130 stars 12 forks source link

Add URL encoding: https://en.wikipedia.org/wiki/URL_encoding #51

Open Albretch opened 1 year ago

Albretch commented 1 year ago

_AZ="激光, 這兩個字是甚麼意思"

_AZ=$(echo "${_AZ}" | recode html..utf-8) echo "// \$_AZ: |${_AZ}|" // $_AZ: |激光, 這兩個字是甚麼意思|

_AZ="t%C3%AAte-%C3%A0-t%C3%AAte" ... // __ $_AZ: |t%C3%AAte-%C3%A0-t%C3%AAte| it should be: "tête-à-tête"

How do you make recode give you UTF-8 regardless of the input string (which encoding should be easy to figure out based on the patterns of the input string)?

rrthomas commented 1 year ago

How do you make recode give you UTF-8 regardless of the input string (which encoding should be easy to figure out based on the patterns of the input string)?

I don't see any HTML character entities. Your examples look like URL escaping, not HTML character entities.

which encoding should be easy to figure out based on the patterns of the input string

Recode does not attempt to guess what encoding its input uses, it uses the encoding you tell it; you'd need another tool to guess encodings.

Albretch commented 1 year ago

OK, is there a way to make recode get as input: "t%C3%AAte-%C3%A0-t%C3%AAte" and give as output: "tête-à-tête" ?

rrthomas commented 1 year ago

No, I don't think recode supports URL encoding. That would be a good thing to add.