seboettg / citeproc-php

Full-featured CSL 1.0.1 processor for PHP
MIT License
73 stars 38 forks source link

string encodings do not seem to be detected correctly #102

Closed bseeger closed 2 years ago

bseeger commented 3 years ago

Please follow the general troubleshooting steps first:

Bug reports:

Hello -- Not sure if this is an issue or not, but I'm using this library in a module I'm creating for Drupal 8 and seeing string conversion errors.

Namely, if I ask for a MLA formatted bibliography of the CSL metadata data below, I get

 "citation-MLA": "<div class=\"csl-bib-body\">\n  <div class=\"csl-entry\">DoeJ. <i>y nonymous eritage</i>. 2001.</div>\n</div>",

Notice the title is clipped and I see iconv errors in the logs.

The best I can figure out is that the code for mb_ucfirst in StringHelper.php isn't considering UTF-8 at all and comes back saying my string is ISO-8859-1 encoded.

Drupal log:

Notice: iconv(): Wrong charset, conversion from `ISO-8859-1' to `UTF-8//IGNORE' is not allowed in Symfony\Polyfill\Mbstring\Mbstring::mb_convert_case() (line 285 of /var/www/drupal/vendor/symfony/polyfill-mbstring/Mbstring.php)

#0 /var/www/drupal/web/core/includes/bootstrap.inc(600): _drupal_error_handler_real(8, 'iconv(): Wrong ...', '/var/www/drupal...', 285, Array)
#1 [internal function]: _drupal_error_handler(8, 'iconv(): Wrong ...', '/var/www/drupal...', 285, Array)
#2 /var/www/drupal/vendor/symfony/polyfill-mbstring/Mbstring.php(285): iconv('ISO-8859-1', 'UTF-8//IGNORE', 'H')
#3 /var/www/drupal/vendor/symfony/polyfill-mbstring/Mbstring.php(590): Symfony\Polyfill\Mbstring\Mbstring::mb_convert_case('H', 0, 'ISO-8859-1')
#4 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Util/StringHelper.php(144): Symfony\Polyfill\Mbstring\Mbstring::mb_strtoupper('H', 'ISO-8859-1')
#5 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Util/StringHelper.php(109): Seboettg\CiteProc\Util\StringHelper::mb_ucfirst('Heritage')
#6 [internal function]: Seboettg\CiteProc\Util\StringHelper::Seboettg\CiteProc\Util\{closure}('Heritage', 2)
#7 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Util/StringHelper.php(110): array_walk(Array, Object(Closure))
#8 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Styles/TextCaseTrait.php(68): Seboettg\CiteProc\Util\StringHelper::capitalizeForTitle('My Anonymous He...')
#9 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Text.php(198): Seboettg\CiteProc\Rendering\Text->applyTextCase('My Anonymous He...', 'en')
#10 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Text.php(103): Seboettg\CiteProc\Rendering\Text->renderVariable(Object(stdClass), 'en')
#11 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Choose/ChooseIf.php(88): Seboettg\CiteProc\Rendering\Text->render(Object(stdClass), NULL)
#12 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Choose/Choose.php(90): Seboettg\CiteProc\Rendering\Choose\ChooseIf->render(Object(stdClass))
#13 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Style/Macro.php(86): Seboettg\CiteProc\Rendering\Choose\Choose->render(Object(stdClass), NULL)
#14 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Text.php(254): Seboettg\CiteProc\Style\Macro->render(Object(stdClass))
#15 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Text.php(111): Seboettg\CiteProc\Rendering\Text->renderMacro(Object(stdClass))
#16 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Group.php(104): Seboettg\CiteProc\Rendering\Text->render(Object(stdClass), 0)
#17 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Layout.php(126): Seboettg\CiteProc\Rendering\Group->render(Object(stdClass), 0)
#18 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Rendering/Layout.php(91): Seboettg\CiteProc\Rendering\Layout->renderSingle(Object(stdClass), 0)
#19 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/Style/Bibliography.php(70): Seboettg\CiteProc\Rendering\Layout->render(Object(Seboettg\CiteProc\Data\DataList), NULL)
#20 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/CiteProc.php(137): Seboettg\CiteProc\Style\Bibliography->render(Object(Seboettg\CiteProc\Data\DataList))
#21 /var/www/drupal/vendor/seboettg/citeproc-php/src/Seboettg/CiteProc/CiteProc.php(183): Seboettg\CiteProc\CiteProc->bibliography(Object(Seboettg\CiteProc\Data\DataList))
#22 /var/www/drupal/web/modules/contrib/idc_export/src/Service/CitationsService.php(30): Seboettg\CiteProc\CiteProc->render(Object(Seboettg\CiteProc\Data\DataList), 'bibliography')
#23 /var/www/drupal/web/modules/contrib/idc_export/src/Plugin/views/field/CitationMLA.php(60): Drupal\idc_export\Service\CitationsService->renderFromMetadata(Array, 'modern-language...', 'bibliography')

The code that Drupal is running is here: mb_string::mb_detect_encoding

And that won't consider UTF-8 unless it's in the handed in explicitly in the list. If I put it in the list, things work well.

But maybe it's something else in my setup? Drupal 8.9.14 PHP 7.2.27

Used CSL stylesheet:

modern-language-association

Used CSL metadata

  [
          {
              "author": [
                  {
                      "family": "Doe",
                      "given": "James",
                      "suffix": "III"
                  }
              ],
              "id": "item-1",
              "issued": {
                  "date-parts": [
                      [
                          "2001"
                      ]
                  ]
              },
              "title": "My Anonymous Heritage",
              "type": "book"
          }
      ]
bseeger commented 3 years ago

another data point -- it has to do with the style sheet chosen, because if I switch to ieee, the problem goes away. But I need MLA style, so I'm a little stuck.

jonasraoni commented 2 years ago

Hi @bseeger!

I'm having the same issue (I'm using a ready Docker container), looks like it's something related to the environment/installation, try to run the code below, if you get an error, then this issue can be probably closed, as it's something on your end:

echo iconv('ISO-8859-1', 'UTF-8//IGNORE', 'test');