OK, so the primary reason that utf8_encode has been deprecated in php 8.2 is that it made the assumption that the encoding of the string it was working on was ISO-8859-1 and it blindly converted whatever it was fed from ISO-8859-1 to utf8.
So this PR does exactly the same thing: it assumes the input is encoded as ISO-8859-1 and converts to utf8. But since iconv is not deprecated it will clean up the deprecation notices.
But what if the input is not ISO-8859-1?
I don’t know what the right answer is; there is prior art for detecting current encoding here: https://github.com/BYVoid/uchardet, but that’s not a PHP library and it seems like guessing encoding is a hard problem with a lot of fiddly edge cases.
My use-case for nostr-php already has all input encoded as utf8. I’m wondering if maybe that will be true for vastly more users than not and maybe (even though NIP-01 specifies utf8) in practice the better thing would be to either just not mess with the encoding at all (i.e. $id = hash('sha256', $hash_content);) or make the input encoding configurable by the user and have that be a parameter to iconv.
OK, so the primary reason that
utf8_encode
has been deprecated in php 8.2 is that it made the assumption that the encoding of the string it was working on wasISO-8859-1
and it blindly converted whatever it was fed fromISO-8859-1
toutf8
.So this PR does exactly the same thing: it assumes the input is encoded as
ISO-8859-1
and converts toutf8
. But sinceiconv
is not deprecated it will clean up the deprecation notices.But what if the input is not
ISO-8859-1
?I don’t know what the right answer is; there is prior art for detecting current encoding here: https://github.com/BYVoid/uchardet, but that’s not a PHP library and it seems like guessing encoding is a hard problem with a lot of fiddly edge cases.
My use-case for
nostr-php
already has all input encoded asutf8
. I’m wondering if maybe that will be true for vastly more users than not and maybe (even though NIP-01 specifiesutf8
) in practice the better thing would be to either just not mess with the encoding at all (i.e.$id = hash('sha256', $hash_content);
) or make the input encoding configurable by the user and have that be a parameter toiconv
.