solemnwarning / rehex

Reverse Engineers' Hex Editor
https://rehex.solemnwarning.net/
GNU General Public License v2.0
2.31k stars 116 forks source link

No search in progress UTF8 (CASE INSENSITIVE) #200

Closed stabud closed 1 year ago

stabud commented 1 year ago

Hi!

I set the file to display in UTF8 encoding, but the search is only performed if CASE SENSITIVE. If you put CASE INSENSITIVE , then it does not find anything.

P.S. Anyway, great HEX editor!

solemnwarning commented 1 year ago

Haven't had a lot of time to look at this, but the text search function isn't currently encoding-aware - it will just search for text in whatever the default encoding of your locale is, and with whatever case-insensitivity rules are provided by the libc strncasecmp() function. Seems weird turning on case insensitivity stops it from matching though. What string are you trying to search for?

stabud commented 1 year ago

Current locale: echo $LANG -> ru_RU.UTF-8 RUSSIAN.

I'll give an example where strncasecmp won't work for UTF8

`

include

include

int main(void) { char str1 = "АБВГДЕЖЗ"; char str2 = "абвгдежз"; int result;

result = strncasecmp(str1, str2, 6);

if (result == 0) printf("Strings compared equal.\n"); else if (result < 0) printf("\"%s\" is less than \"%s\".\n", str1, str2); else printf("\"%s\" is greater than \"%s\".\n", str1, str2);

return 0; }

/**** The output should be similar to: ***

"АБВГДЕЖЗ" is less than "абвгдежз".

***/ ` Source code saved in UTF8 without BOM.

These strings are the same, but in different case. But strncasecmp returns as if they are different strings. Probably the text should be converted to UTF32 (ICONV or custom conversion functions) and then compare (wcscasecmp).

solemnwarning commented 1 year ago

Fixed in 53a7d0b5f0219d6156e19ced39c486b18e7e683e.

stabud commented 1 year ago

Thank you!