openvenues / jpostal

Java/JNI bindings to libpostal for for fast international street address parsing/normalization
MIT License
105 stars 42 forks source link

libpostal logs "invalid UTF-8" warning for a string having "\0" or "\u0000" and stays in waiting state #36

Open myasirkhan opened 4 years ago

myasirkhan commented 4 years ago

Using the jpostal, if I call jpostal parseaddress like:

AddressParser.getInstance().parseAddress("Rue du Médecin-Colonel Calbairac Toulouse France\u0000")

I am seeing this warning logged

WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory
WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory
WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory
WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory

And the thread remains in waiting state.. This happens only when the address have \u0000 or (simple \0) character in it. Simplest solution seems to not send \0 character or replace it before calling parseAddress...

wboult commented 4 years ago

@myasirkhan I hit this too, I've stolen the above example in some of the tests for a PR I just created