seboettg / citeproc-php

Full-featured CSL 1.0.1 processor for PHP
MIT License
75 stars 39 forks source link

Returns empty citation with some formats when citation contains certain cyrillic characters #49

Closed NateWr closed 6 years ago

NateWr commented 6 years ago

Please follow the general troubleshooting steps first:

Bug reports:

We're getting an empty citation returned when passing a citation for a journal article titled Est maximus eu donec congue “Nešto Između” Srđana Karanović“a. It returns:

<div class="csl-bib-body">
  <div class="csl-entry">.</div>
</div>

Used CSL stylesheet:

Can reproduce this with chicago-author-date.csl and turabian-fullnote-bibliography.csl.

Used CSL metadata

We pull the citation data from PHP, so here's the stdClass that I pass to CiteProc::render:

stdClass Object
(
    [type] => article-journal
    [id] => 30
    [title] => Est maximus eu donec congue \xe2\x80\x9cNe\xc5\xa1to Izme\xc4\x91u\xe2\x80\x9d Sr\xc4\x91ana Karanovi\xc4\x87\xe2\x80\x9ca
    [container-title] => Journal of Public Knowledge
    [container-title-short] => publicknowledge
    [volume] => 2
    [issue] => 3
    [section] => Articles
    [URL] => http://localhost/ojs/publicknowledge/article/view/30
    [accessed] => stdClass Object
        (
            [raw] => 2018-04-02
        )

    [author] => Array
        (
            [0] => stdClass Object
                (
                    [family] => Corino
                    [given] => Carlo
                )

            [1] => stdClass Object
                (
                    [family] => Contributor
                    [given] => Test
                )

            [2] => stdClass Object
                (
                    [family] => Another
                    [given] => Test
                )

        )

    [issued] => stdClass Object
        (
            [raw] => 2017-10-17 00:00:00
        )

    [DOI] => 10.1234/publicknowledge.v2i3.30
)
seboettg commented 6 years ago

Hey Nate, I'm currently busy. I will lookup for a fix soon.

NateWr commented 6 years ago

No problem, I'm the same. Thanks for looking into it. :+1:

seboettg commented 6 years ago

I'm not able to encode the Array dump into JSON. Please call json_encode and pass that stdClass object and post the result here. Thank you!

NateWr commented 6 years ago

Sure, here it is in JSON. It looks like maybe the characters in the title have been transformed a bit, but it may also be that my test data has changed a bit (still get the same errors though):

{
    "type": "article-journal",
    "id": "30",
    "title": "Est maximus eu donec congue \\u201cNe\\u0161to Izme\\u0111u\\u201d Sr\\u0111ana Karanovi\\u0107\\u201ca",
    "container-title": "Journal of Public Knowledge",
    "container-title-short": "publicknowledge",
    "volume": "2",
    "issue": "3",
    "section": "Articles",
    "URL": "http:\\/\\/localhost\\/ojs\\/publicknowledge\\/article\\/view\\/30",
    "accessed": {
        "raw": "2018-04-11"
    },
    "author": [{
        "family": "Corino",
        "given": "Carlo"
    }, {
        "family": "Contributor",
        "given": "Test"
    }, {
        "family": "Another",
        "given": "Test"
    }],
    "issued": {
        "raw": "2017-10-17 00:00:00"
    },
    "DOI": "10.1234\\/publicknowledge.v2i3.30"
}
seboettg commented 6 years ago

Okay, I've determine the problem.

The stylesheet specifies that the title should be formatted using text-case="title":

<text variable="title" quotes="true" text-case="title"/>

For uppercase strings, the first character of each word remains capitalized. All other letters are lowercased. For lower or mixed case strings, the first character of each lowercase word is capitalized. The case of words in mixed or uppercase stays the same. (have look in the csl specification)

To do this I split the title into single words by spaces and hyphen. The mb_strtoupper method cannot handle a word like “Nešto because is not a valid letter which can be uppercased. So the complete string was destroyed. Now citeproc-php checks first if the first character of a word is a valid utf-8 letter that can be uppercased, before running mb_strtoupper.

Please test if the issue is solved by using the branch issue-49.

If everything fine, i will make a bugfix release soon.

NateWr commented 6 years ago

Fix confirmed. Thanks for another quick fix! I'll look forward to the next release. :+1:

seboettg commented 6 years ago

Okay! 2.1.2 has been released.