ocaml / flexdll

a dlopen-like API for Windows
Other
98 stars 30 forks source link

reloc.ml: should not use String.lowercase_ascii #35

Closed nojb closed 7 years ago

nojb commented 7 years ago

While investigating an issue related to ocaml/ocaml#1200, I noticed that there are several uses of String.lowercase_ascii applied to filenames in the module reloc.ml. It would be preferable not to use this function since those filenames may be UTF-8-encoded in the future.

alainfrisch commented 7 years ago

There are some use where the result is used as a memoization key to avoid repeated lookups on the file system (which is case insensitive). Which problems do you see with lowercase_ascii being applied to utf8-encoded strings?

nojb commented 7 years ago

Indeed, it is true that the use here is probably safe, but in any case it won't correspond to the notion of "case-insensitive" of the file system. Luckily, it seems that one can use some Window APIs to access this precise notion:

https://blogs.msdn.microsoft.com/greggm/2005/09/21/comparing-file-names-in-native-code/

alainfrisch commented 7 years ago

I'm not sure we need to be so precise, at least when this is only used to avoid repeated calls to the file system. What we need is that two filenames being equal after normalization are assumed to be identical by the file system, not the other way around.

Moreover, I'd prefer to maintain the property that flexlink can be produced using only the OCaml compiler and stdlib (no Unix, no external stub).

nojb commented 7 years ago

You are right, it is perfectly fine and safe. I got confused and was thinking that lowercase_ascii was the Latin-1 lowercase which could potentially lead to problems when passed UTF-8 strings.

Closing!