Closed ignacioalles closed 5 months ago
Thank you for raising this issue. It's quite specific. I don't really understand your example with Juan García and Juan Ávila, can you be more explicit? I understand the code is removing important spaces, but I don't get the cases where it works fine, then (I suppose it works fine most of the time for artist names).
Thanks for your additional help.
I've made a branch with a sample (and updated test case) at: https://github.com/miqwit/dedex/compare/master...ignacioalles:dedex:utf8
I can make a pull request if you want but I didn't have a fix for it yet.
The bug arises if a character that triggers xml_parser to split the callbacks is preceded by a whitespace or not. In the examples I provided in previous message I tried to illustrate the case, where the letter A with accent (Á
) is the first letter of the second word while the letter I with accent (í
) is in the middle of García.
I'm closing the issue and I can confirm that is released version 2.0.7 solves my case.
An invalid value is produced when multibyte char is present in XML file. Altough it seams to be handled in: https://github.com/miqwit/dedex/blob/76e150b35adb7652eddcb95c261b9b920866c095/src/Controller/ErnParserController.php#L450-L454
there is still an issue because the preceding value to which the new value is concatenated, was trimmed here: https://github.com/miqwit/dedex/blob/76e150b35adb7652eddcb95c261b9b920866c095/src/Controller/ErnParserController.php#L606
thus removing the whitespaces that might be between them.
Current code works fine if special char is not preceded by whitespace (eg:
Juan García
) but produced wrong value if it is (eg:Juan Ávila
results inJuanÁvila
)