seboettg / citeproc-php

Full-featured CSL 1.0.1 processor for PHP
MIT License
75 stars 39 forks source link

No whitespace between given and family names of authors/editors #75

Closed kchoong closed 4 years ago

kchoong commented 4 years ago

Hello @seboettg !

I had an issue with a few citation stylesheets, where the family and given name are not seperated by a space or anything. The ones I had an issue with where: Springer LNCS Result: 1. DoeJ.: My Anonymous Heritage. (2001). 2. AndersonJ., BrownJ.: Two authors writing a book. (1998). 3. ColeS.J., MooreR.: Hydrological modelling using raingauge- and radar-based estimators of areal rainfall. Journal of Hydrology. 358, 159-181 (2008). https://doi.org/10.1016/j.jhydrol.2008.05.025. 4. BenzD., HothoA., JäschkeR., StummeG., HalleA., LimaA.G.S., SteenwegH., StefaniS., DietrichB.: Academic Publication Management with PUMA - collect, organize and share publications. In: Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. p. 417-420. Springer, Berlin/Heidelberg (2010).

APA Result: AndersonJ., & BrownJ. (1998). Two authors writing a book. BenzD., HothoA., JäschkeR., StummeG., HalleA., LimaA. G. S., SteenwegH., StefaniS., & DietrichB. (2010). Academic Publication Management with PUMA - collect, organize and share publications. Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries, 6273, 417-420. Berlin/Heidelberg: Springer. ColeS. J., & MooreR. (2008). Hydrological modelling using raingauge- and radar-based estimators of areal rainfall. Journal of Hydrology, 358(3-4), 159-181. https://doi.org/10.1016/j.jhydrol.2008.05.025 DoeJ. (2001). My Anonymous Heritage.

Harvard Cite them right Result: AndersonJ. and BrownJ. (1998) Two authors writing a book. BenzD., HothoA., JäschkeR., StummeG., HalleA., LimaA. G. S., SteenwegH., StefaniS. and DietrichB. (2010) “Academic Publication Management with PUMA - collect, organize and share publications”, in Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. Berlin/Heidelberg: Springer (Lecture Notes in Computer Science), p. 417-420. ColeS. J. and MooreR. (2008) “Hydrological modelling using raingauge- and radar-based estimators of areal rainfall”, Journal of Hydrology, 358(3-4), pp. 159-181. doi: 10.1016/j.jhydrol.2008.05.025. DoeJ. (2001) My Anonymous Heritage.

I am not sure, if it's just an issue on my side, since I am having similiar results with other .csl's as well, but didn't find an previous issue like that in the repositorie's issue section. Though the small change resolved the issue for all the citation styles I tried to use.

seboettg commented 4 years ago

Hi @kchoong! Thank you for that PR. At the first glance, the small bugfix looks very plausible. Unfortunately one test has failed. However, it might be possible that this special test is faulty, which means that a wrong result is expected. I'm going to check this soon and keep you informed. KR, Sebastian

kchoong commented 4 years ago

Hey, I added another check, if the given names are asian glyphs, similiar to what you did with latin and cryllic characters, before adding the space that makes the test fail. Kind regards, kchoong

seboettg commented 4 years ago

Hey @kchoong, I built a test case for your given examples and wasn't able to reproduce your described behavior. I wondered also before why the condition "isLatinString" not matched in case of "Doe, John". I guess the input data which you used is not UTF-8 formatted, since the regular expression for latin characters does not work.

kchoong commented 4 years ago

Hey @seboettg, yes you are right that is really odd now that I think about it. For my examples I listed I just edited the given index.php file in the examples directory, while the data.json I haven't touched at all. I will try to see, if I can resolve it by looking into the formating of my input.

kchoong commented 4 years ago

Hey @seboettg, it just seems like my local PHP instance is having issues with the regular expression "/^[\p{Latin}\s\p{P}]*$/u" and the other similar ones and only matches for strings like "DoeJohn", while "Doe, John" won't match. I also tried to UTF-8 encode the input beforehand, but the results are the same. If I try it with any PHP fiddler or regex tester, it matches for both correctly.

Would you have any idea how this issue is occurring?

These are my PHP PCRE Settings: Screenshot_2019-12-02 PHP 7 4 0 - phpinfo()

seboettg commented 4 years ago

I have no idea, unfortunately... It might be an issue that is caused by your operating system? Which OS do you use?