seboettg / citeproc-php

Full-featured CSL 1.0.1 processor for PHP
MIT License
76 stars 39 forks source link

If an author name begins with a multibyte UTF8 character, the name is not converted to an initial #146

Open jnugent opened 1 year ago

jnugent commented 1 year ago

Please follow the general troubleshooting steps first:

Bug reports:

We first received a report of this behaviour in one of our hosted OJS installations. A Norwegian journal who had authors with the first name Åse reported that citations generated with this name did not convert this to Å but instead kept the whole name. I've since duplicated this in other OJS installations, and have also just cloned the citeproc-php repository and changed one of the author names in the sample JSON to Åse and the problem occurs there as well, when running the example/index.php script.

In my investigating I discovered that the file ./src/Util/StringHelper.php contains a method called initializeBySpaceOrHyphen that appears to split the first letter off of a string. If a string like Åse is passed in, the first letter is correctly split off but then the StringHelper::isLatinString method returns true here (it probably should not?) and then the call to ctype_upper fails, which causes the method to return the entire string again. I was able to get this to correctly work by temporarily negating the test for StringHelper::isLatinString in the if statement.

Used CSL stylesheet:

apa.csl

Used CSL metadata

Please replace these lines with your used metadata, for instance:

[
    {
        "author": [
            {
                "family": "Anderson",
                "given": "John"
            },
            {
                "family": "Brown",
                "given": "Åse"
            }
        ],
        "id": "ITEM-2",
        "type": "book",
        "title": "Two authors writing a book"
    }
]
ronste commented 1 year ago

Hi,

here's another example were this bug kicks in: https://www.cgt-journal.org/index.php/cgt/article/view/11

RewindLife commented 1 year ago

Our journal also has the same issue with failed abbreviation of authors' first names starting with characters with diactritics. Can this issue be assigned a bug status? Any idea how quickly this can be fixed. Thank you in advance.

citation abbreviation issue