seboettg / citeproc-php

Full-featured CSL 1.0.1 processor for PHP
MIT License
75 stars 39 forks source link

Page numbers or locators not working with "book" type and "society-of-biblical-literature-fullnote-bibliography" style. #82

Closed rudolfbyker closed 3 years ago

rudolfbyker commented 4 years ago

Please follow the general troubleshooting steps first:

Bug reports:

According to the guys over at https://github.com/citation-style-language , page numbers for books using the SBL style should work using the "locator" variable when rendering "citations" (called "notes" in the SBL Handbook). However, neither the page variable nor the locator variable has any effect. The authors of the stylesheet denies that the problem is with the stylesheet, and told me to ask here. Read https://github.com/citation-style-language/styles/issues/3795 for the back story.

Used CSL stylesheet:

society-of-biblical-literature-fullnote-bibliography.csl

Used CSL metadata

Saved as issue.json:

    [
        {
            "author": [
                {
                    "family": "Anderson", 
                    "given": "John"
                }, 
                {
                    "family": "Brown", 
                    "given": "John"
                }
            ], 
            "id": "ITEM-2", 
            "type": "book",
            "title": "Two authors writing a book",
            "locator": "123"
        }
    ]

Source code

Saved as issue.php:

<?php

include __DIR__ . "/../vendor/autoload.php";
use Seboettg\CiteProc\StyleSheet;
use Seboettg\CiteProc\CiteProc;

$style = StyleSheet::loadStyleSheet("society-of-biblical-literature-fullnote-bibliography");
$citeProc = new CiteProc($style, "en-US");
$data = json_decode(file_get_contents("issue.json"));
$citation = $citeProc->render($data, "citation");
$bibliography = $citeProc->render($data, "bibliography");
?>
<html>
<head>
    <title>CSL Test</title>
</head>
<body>
<h1>Citation</h1>
<pre><?php var_dump($citation); ?></pre>
<h1>Bibliography</h1>
<pre><?php var_dump($bibliography); ?></pre>
</body>
</html>

Output

Citation
/var/www/html/citeproc-php/example/issue.php:19:string 'John Anderson and John Brown, <i>Two Authors Writing a Book</i>, n.d.' (length=69)
Bibliography
/var/www/html/citeproc-php/example/issue.php:21:string '<div class="csl-bib-body">
  <div class="csl-entry">Anderson, John, and John Brown. <i>Two Authors Writing a Book</i>, n.d.</div>
</div>' (length=136)

More info

Changing locator to page changes nothing.

seboettg commented 4 years ago

Hi @rudolfbyker! I guess this feature isn't implemented yet :( I'm not familiar with "locators" in the context of CSL, since nobody required that feature up to now. Nevertheless, I'm very interested in a full-featured CSL implementation for citeproc-php and I will try to implement this feature. Please understand that the implementation may take some time, since I have to familiarize myself with this subject and should understand how other processors like citeproc-js implement this feature.

rudolfbyker commented 4 years ago

Hi @seboettg Thanks for considering my use case! I must admit that I don't really understand the style sheets when I try to read them. I read the spec at http://docs.citationstyles.org/en/stable/specification.html but it's very cryptic. Is there a course or tutorial somewhere? Or even some example code? Then maybe I could help implement some missing features. Since I need it for a project at work, we can spend some office hours on this.

Edit: I somehow skipped the "primer" before. After reading that, CSL makes sense to me. http://docs.citationstyles.org/en/stable/primer.html

rudolfbyker commented 4 years ago

Since I'm desperate to get this working, I first read the SBL style sheet very carefully, then started stepping through the code. The locator should be printed in the point-locators-subsequent macro:

<macro name="point-locators-subsequent">
    <group>
      <choose>
        <if locator="page" match="none">
          <choose>
            <if type="bill book graphic legal_case legislation motion_picture report song" match="any">
              <group delimiter=", ">
                <group delimiter=" ">
                  <text term="volume" form="short"/>
                  <number variable="volume" form="numeric"/>
                </group>
                <group delimiter=" ">
                  <label variable="locator" form="short"/>
                  <text variable="locator"/>
                </group>
              </group>
            </if>
          </choose>
        </if>
        <else-if variable="volume">
          <choose>
            <if type="book" variable="collection-title" match="all">
              <!--Filter SBL 6.2.21-->
              <text variable="locator"/>
            </if>
            <else-if type="chapter" variable="container-title" match="all">
              <!--Filter SBL 6.2.23-->
              <text variable="locator"/>
            </else-if>
            <else-if type="chapter" variable="collection-title" position="subsequent" match="all">
              <!--Filter SBL 6.2.22-->
              <text variable="locator"/>
            </else-if>
            <else-if type="article-journal">
              <!--Filter SBL 6.3.1-->
              <text variable="locator"/>
            </else-if>
            <else>
              <group delimiter=":">
                <number variable="volume" form="numeric"/>
                <text variable="locator"/>
              </group>
            </else>
          </choose>
        </else-if>
        <else>
          <text variable="locator"/>
        </else>
      </choose>
    </group>
  </macro>

I stepped through all the way to where this part is processed:

<group delimiter=" ">
  <label variable="locator" form="short"/>
  <text variable="locator"/>
</group>

It seems that <label variable="locator" form="short"/> comes back empty:

image

The locator variable is rendered correctly using <text variable="locator"/>. However, the empty label causes the entire group to disappear:

image

Next step: Figure out why the label is empty. I think I have to dig further into this line of code (Label.php:111)

$term = CiteProc::getContext()->getLocale()->filter('terms', $variable, $form);
seboettg commented 4 years ago

Thank you for digging into this matter! I will now also check how citeproc-js handles with locators and I'll keep you updated.

rudolfbyker commented 4 years ago

My pleasure! I would love to just give you a pull request, but I don't fully understand the problem, yet. Here's some more digging:

image

It's looking for a "locator" term in the locale, which does not exist. I looked through the locales provided by the citation-style-language repo, and none of them have a term named "locator". However, they each have a section of "short locator forms" and "long locator forms" and "symbol locator forms". One of them is called "page":

<term name="page" form="short">
  <single>p.</single>
  <multiple>pp.</multiple>
</term>

It seems that there are two distinct things called "locator" in CSL. One is a variable, and the other is the "locator type". The latter is used in the "if" at the beginning of the macro:

<macro name="point-locators-subsequent">
    <group>
      <choose>
        <if locator="page" match="none">

I'm assuming that means "run this part of the macro if the citation uses a locator type that is NOT a page". But it runs, and I'm using pages, so maybe that's a related issue.

seboettg commented 4 years ago

Ok, I got it... It seems to me, that the complete "locator" data aspect is neither a part of the standard CSL schema nor of the CSL specs . Since locators will be printed only in citation mode, you have to put locator relevant data into the citation items. In the chapter "Advanced usage of citeproc-php" of the README you can see how to use citation items in order to filter the whole dataset to process only particular items for citation rendering. citeproc-js uses this to define labels for locators and the content for the "locator" i.e. an page range. Have a look to this test case example: https://github.com/citation-style-language/test-suite/blob/master/processor-tests/humans/locator_SimpleLocators.txt Here you can see in the chapter citation-items how it could also work for citeproc-php.

seboettg commented 4 years ago

Of course, this still has to be developed.

seboettg commented 4 years ago

@rudolfbyker some answers for your questions:

It seems that <label variable="locator" form="short"/> comes back empty
The locator variable is rendered correctly using <text variable="locator"/>. However, the empty label causes the entire group to disappear:

<label variable="locator" form="short"/> looks for a label term for the variable "locator", this is usually part of the locales. Locales can be defined globally as part of the language specific locale xml or locally as part of the specific CSL stylesheet. It is empty since there is no locator entry in the locales.

Yes, the completely group disappears. This behaviour is defined in the CSL specs:

cs:group implicitly acts as a conditional: cs:group and its child elements are suppressed if a) at least one rendering element in cs:group calls a variable (either directly or via a macro), and b) all variables that are called are empty. This accommodates descriptive cs:text elements. 

<if locator="page" match="none">
I'm assuming that means "run this part of the macro if the citation uses a locator type that is NOT a 
page". But it runs, and I'm using pages, so maybe that's a related issue.

Exactly, match="none" is a negation.

rudolfbyker commented 4 years ago

I didn't quite understand everything you wrote...

... the complete "locator" data aspect is neither a part of the standard CSL schema nor of the CSL specs.

What do you mean, it's not part of the CSL specs? The locator term is listed here http://docs.citationstyles.org/en/stable/specification.html#locators and the locator variable is listed here http://docs.citationstyles.org/en/stable/specification.html#standard-variables . So both are part of the specs.

Since locators will be printed only in citation mode, you have to put locator relevant data into the citation items.

How is that different from what I did in my first post, in issue.json?

seboettg commented 4 years ago
I didn't quite understand everything you wrote...

    ... the complete "locator" data aspect is neither a part of the standard CSL schema nor of the CSL specs.

My fault, that was wrong. I meant that the data related to locators described in the CSL specification is not part of the data that is usually used to render a bibliography. This also makes sense, since a bibliography is "only" the set of all literature that is referenced in a scientific paper. If you refer to a book in one citation and specify a locator (e.g. chapter 3) and later refer to the same book in another citation but refer to a different section (e.g. chapter 5), then the book is still the same, only the references to the chapter change. Therefore, the data to be cited and the citations are different data.

This means that you need two inputs, your data for the bibliography, and the data for particular citation.

Assuming this two items are your data for your bibliography:

[
  {
    "author": [
      {
        "family": "Doe",
        "given": "James",
        "suffix": "III"
      }
    ],
    "id": "item-1",
    "issued": {
      "date-parts": [
        [
          "2001"
        ]
      ]
    },
    "title": "My Anonymous Heritage",
    "type": "book"
  },
  {
    "author": [
      {
        "family": "Anderson",
        "given": "John",
        "id": "anderson.j"
      },
      {
        "family": "Brown",
        "given": "John",
        "id": "brown.j"
      }
    ],
    "issued": {
      "date-parts": [
        [
          "1998"
        ]
      ]
    },
    "id": "item-2",
    "type": "book",
    "title": "Two authors writing a book"
  }
]

In order to render citations you have to refer to this items

<p>This a wise sentence <?php echo $citeProc->render($data, "citation", json_decode('[{"id":"item-1"}]')); ?>. <!-- refers to item-1 --></p>
<p>This is another wise sentence <?php echo $citeProc->render($data, "citation", json_decode('[{"id":"item-2"}]')); ?>. <!-- refers to item-2 --></p>

Now you can add locator to the first citation:

<p>This a wise sentence <?php echo $citeProc->render($data, "citation", json_decode('
[
        {
            "id": "item-1", 
            "label": "chapter", 
            "locator": "3"
        }
 ]
')); ?>. <!-- refers to item-1 chapter 3--></p>

<p>This is another wise sentence <?php echo $citeProc->render($data, "citation", json_decode('
[
        {
            "id": "item-1", 
            "label": "chapter", 
            "locator": "5"
        }
]
')); ?>. <!-- refers to item-1 chapter 5 --></p>

<p>This is yet another wise sentence 
<?php echo $citeProc->render($data, "citation", json_decode('[{"id":"item-2"}]')); ?>.
<!-- refers to item-2 without any locator --></p>
rudolfbyker commented 4 years ago

I get it now. Thanks! :D

seboettg commented 4 years ago

With my last two commits, you can now render locators as described above. Currently I extend this feature to make sure that it also works for more complicated examples.

seboettg commented 4 years ago

Hi @rudolfbyker! I think everything was developed in order to meet your requirements. Please checkout branch feature/locators-issue-82, run composer update, and perform a couple of tests as follows: Your data.json should look like this:

    [
        {
            "author": [
                {
                    "family": "Anderson", 
                    "given": "John"
                }, 
                {
                    "family": "Brown", 
                    "given": "John"
                }
            ], 
            "id": "ITEM-2", 
            "type": "book",
            "title": "Two authors writing a book",
        }
    ]

Render citations in the following way or similar:

<?php
$dataJson = file_get_contents("data.json");
$data = json_decode($dataJson);
$style = StyleSheet::loadStyleSheet('society-of-biblical-literature-fullnote-bibliography');
$citeProc = new CiteProc($style, "en-US");
?>
<h1>Lorem Ipsum</h2>
<?php 
$cite1 = $citeProc->render($data, "citation", json_decode('
[
   {
      "id": "ITEM-2",
      "label": "page",
      "locator": "12 - 17"
   }
]
');
?>
<p>Lorem ipsum dolor sit amet <?php echo $cite1 ?></p>
<?php 
$cite2 = $citeProc->render($data, "citation", json_decode('
[
   {
      "id": "ITEM-2",
      "label": "page",
      "locator": "12"
   }
]
');
?>
<p>At vero eos et accusam et justo duo dolores et ea rebum <?php echo $cite2 ?>.</p>

I hope everything works as expected.

rudolfbyker commented 4 years ago

Thanks for your work.

There are a few errors in your example:

After fixing those, the output still contains no page numbers:

Lorem ipsum dolor sit amet John Anderson and John Brown, Two Authors Writing a Book, , , 1998.

At vero eos et accusam et justo duo dolores et ea rebum John Anderson and John Brown, Two Authors Writing a Book, , , 1998..

I'm on the feature/locators-issue-82 branch on commit d78ed37aacca287610437470daad514d6d3f441e and composer update ran successfully.

By the way, was the extra PHP requirement of ext-intl intentional?

rudolfbyker commented 4 years ago

I jumped into the debugger again today. Here are some things I found:

<macro name="point-locators">
  <choose>
    <if variable="locator" match="none">
      <text macro="pages"/>
    </if>
    <else-if type="article-journal">
      ...
    </else-if>
    <else-if type="entry-dictionary entry-encyclopedia" match="any">
      ...
    </else-if>
    <else>
      <text macro="point-locators-subsequent" prefix=", "/>
    </else>
  </choose>
</macro>

I fiddled with Layout.php and managed to get the locator to display by patching it as follows:

--- src/Rendering/Layout.php    (revision d78ed37aacca287610437470daad514d6d3f441e)
+++ src/Rendering/Layout.php    (date 1588662569551)
@@ -106,7 +106,7 @@
                     return $this->renderGroupedCitations($data, $citationItems);
                 } else {
                     $data = $this->filterCitationItems($data, $citationItems);
-                    $ret = $this->renderCitations($data, $ret);
+                    $ret = $this->renderCitations($data, $ret, $citationItems);
                 }
             } else {
                 $ret = $this->renderCitations($data, $ret);
@@ -196,10 +196,11 @@
      * @param  $ret
      * @return string
      */
-    private function renderCitations($data, $ret)
+    private function renderCitations($data, $ret, $citationItems)
     {
         CiteProc::getContext()->getResults()->replace([]);
         foreach ($data as $citationNumber => $item) {
+            $item = (object) array_merge((array) $item, (array) $citationItems[$citationNumber]);
             $renderedItem = $this->renderSingle($item, $citationNumber);
             $renderedItem = CiteProcHelper::applyAdditionMarkupFunction($item, "csl-entry", $renderedItem);
             CiteProc::getContext()->getResults()->append($renderedItem);

Test code:

<?php

include __DIR__ . "/../vendor/autoload.php";
use Seboettg\CiteProc\StyleSheet;
use Seboettg\CiteProc\CiteProc;

$data = json_decode('
[
  {
    "author": [
      {
        "family": "Anderson",
        "given": "John"
      },
      {
        "family": "Brown",
        "given": "John"
      }
    ],
    "id": "ITEM-2",
    "type": "book",
    "title": "Two authors writing a book",
    "issued": {
      "date-parts": [
        [
          "1998"
        ]
      ]
    },
    "publisher": "Banner of Truth"
  }
]
');
$style = StyleSheet::loadStyleSheet('society-of-biblical-literature-fullnote-bibliography');
$citeProc = new CiteProc($style, "en-US");

?>

<h1>Lorem Ipsum</h1>
<?php
$citationItems = json_decode('
[
   {
      "id": "ITEM-2",
      "label": "page",
      "locator": "12 - 17"
   }
]
');
$cite1 = $citeProc->render($data, "citation", $citationItems);
?>
<p><?php echo $cite1 ?></p>

Output:

Lorem Ipsum

John Anderson and John Brown, Two Authors Writing a Book, , (Banner of Truth, 1998), 12-17.

I'm still puzzled about the double comma, though. Maybe that's a separate issue.

seboettg commented 4 years ago

Hi @rudolfbyker, Embarrassingly, I didn't test the given example of my last post. I haven't forgotten your issues. But have currently little time to take care of this.

Since you've been working on this, I want to let you know what are my findings. The Problem is not a bug, but missing feature. The Position Constraint Seboettg\CiteProc\Constraint\Position isn't implemented yet, which cause this behavior. If you use another style without Position-Conditions, locators should work as expected. You can also easily solve your problem quick and dirty, by changing the return value of the validate method to "true" in the Position class. Subsequently your locators will be shown using stylesheet "society-of-biblical-literature-fullnote-bibliography".

But that solves not the actual issue. My approach up to now is to implement the position constraint. Please have a look in my latest commit: https://github.com/seboettg/citeproc-php/commit/9d87a1a8af1a5261c9a8633936ca1af9b08316e9 It would be great if you could keep working for that approach.

seboettg commented 4 years ago

I'm sorry for answering bit by bit. Okay, now the answers to your questions and comments:

Layout.php:83: Incompatible method declaration. The Rendering interface should be updated.

Yes, thats right. Feel free to make PR. Otherwise I'm going to fix this with next refactoring changes.

Layout.php:109: The citationItems are not passed to renderCitations. Where does renderCitations get the label and locator data, if this is not passed?

Yes, citationItems are placed in the static context of the class CiteProc. In class Locator:35 the method getCitationItemsById is called to get it depending on the current citation data object.

Therefore, I guess, the conclusion of your third comment is probably invalid, if I understood you correctly. Please set the following Breakpoints on Seboettg\CiteProc\Constraint\Locator:35, Seboettg\CiteProc\Rendering\Label:108 and Seboettg\CiteProc\Rendering\Text:171.

rudolfbyker commented 4 years ago

Oh, I see what you mean with static context now. Not sure why it works that way, but I'm sure you have a valid reason.

In any case, using my example code from my previous comment, none of the three breakpoints you mentioned are ever reached.

I see that Text::renderLocator is only called in Text.php:100. That line is never reached, because $this->toRenderTypeValue === "locator" is never TRUE.


Back to my train of thought from my previous comment: <if variable="locator" match="none"> is still matching, which should not happen, obviously. To debug that, I ...

I'm assuming you expected it to call Locator::matchForVariable instead, from Locator.php:32?

seboettg commented 4 years ago

Ok, now I get your point. Please have a look at my commit 5c62ecf, especially the new test case https://github.com/seboettg/citeproc-php/commit/5c62ecff3cdf16405f05e601b5b119538dee42d1#diff-55f3c483b367c2021ae8ef0286539e18

I think these changes should solve the issues with conditions like this <if variable="locator" match="none">.

rudolfbyker commented 4 years ago

Working now! Same test code (from this comment) on my side and using commit 259e0fe33424d0215998a58ac41e5230604122f7 (latest on branch feature/locators-issue-82).

Output:

John Anderson and John Brown, Two Authors Writing a Book, , (Banner of Truth, 1998), 12-17.

Still puzzled about that extra comma, though... If I simplify the CSL sheet by manually removing a few things that would be skipped by "choose", we get this for citation style (I also changed the delimiters to # and * so we can see which groups are responsible for the spaces and commas):

<citation et-al-min="4" et-al-use-first="1" disambiguate-add-names="true">
    <layout suffix="." delimiter="; ">
      <group delimiter="# ">
        <text macro="contributors-note"/>
        <text macro="title-note"/>
        <text macro="description-note"/>
        <text macro="secondary-contributors-note"/>
        <text macro="container-title-note"/>
        <text macro="container-contributors-note"/>
        <choose>
          <if variable="volume collection-title" match="all">
            <!--If Volume # and collection-title -->
            <group delimiter="*">
              <!--Added group to prevent comma between vol # of...and collection-title-->
              <text macro="locators-note"/>
              <text macro="collection-title-note"/>
            </group>
          </if>
          <else>
            <text macro="locators-note"/>
            <text macro="collection-title-note"/>
          </else>
        </choose>
        <text macro="collection-editor-note"/>
      </group>
      <text macro="issue-note"/>
      <text macro="locators-newspaper" prefix=", "/>
      <text macro="point-locators"/>
      <text macro="access-note" prefix=", "/>
    </layout>
  </citation>

Then we still get John Anderson and John Brown# Two Authors Writing a Book# # (Banner of Truth, 1998), 12-17.. But when I replace the last choose with the contents of else, like this ...

<citation et-al-min="4" et-al-use-first="1" disambiguate-add-names="true">
    <layout suffix="." delimiter="; ">
      <group delimiter="# ">
        <text macro="contributors-note"/>
        <text macro="title-note"/>
        <text macro="description-note"/>
        <text macro="secondary-contributors-note"/>
        <text macro="container-title-note"/>
        <text macro="container-contributors-note"/>
        <text macro="locators-note"/>
        <text macro="collection-title-note"/>
        <text macro="collection-editor-note"/>
      </group>
      <text macro="issue-note"/>
      <text macro="locators-newspaper" prefix=", "/>
      <text macro="point-locators"/>
      <text macro="access-note" prefix=", "/>
    </layout>
  </citation>

... then the problem disappears: John Anderson and John Brown# Two Authors Writing a Book (Banner of Truth, 1998), 12-17.

As far as I know, wrapping something in <choose> should not affect the output. I'm not sure if that's a separate issue or not.

seboettg commented 4 years ago
As far as I know, wrapping something in <choose> should not affect the output. I'm not sure if that's a separate issue or not.

I suppose that the missing group tag causes this behavior not the missing choose.

Please try the following:

<citation et-al-min="4" et-al-use-first="1" disambiguate-add-names="true">
    <layout suffix="." delimiter="; ">
      <group delimiter="# ">
        <text macro="contributors-note"/>
        <text macro="title-note"/>
        <text macro="description-note"/>
        <text macro="secondary-contributors-note"/>
        <text macro="container-title-note"/>
        <text macro="container-contributors-note"/>
        <group delimiter="*">
          <text macro="locators-note"/>
          <text macro="collection-title-note"/>
        </group>
        <text macro="collection-editor-note"/>
      </group>
      <text macro="issue-note"/>
      <text macro="locators-newspaper" prefix=", "/>
      <text macro="point-locators"/>
      <text macro="access-note" prefix=", "/>
    </layout>
  </citation>

Using group is not only for grouping, it may lead to implication that may have effects of the rendered result.

From CSL specs:

cs:group implicitly acts as a conditional: cs:group and its child elements are suppressed if a) at least one rendering element in cs:group calls a variable (either directly or via a macro), and b) all variables that are called are empty. 
rudolfbyker commented 4 years ago

Your suggestion also works. But it was just an example. I'm not actually at liberty to change the CSL file, since it is part of the repo at https://github.com/citation-style-language .

However, I found what looks like a smoking gun in ChooseIf.php:79 (using the original style sheet)

Selection_357

foreach ($this->children as $child) will iterate over the children, which are <text macro="locators-note"/> and <text macro="collection-title-note"/>. Both come back empty, as they should, so $ret is an array of two empty strings. The glue is given by the surrounding group, and is "," (or "#" in my modified version). Imploding those gives us the unwanted extra delimiter.

I suspect we are missing a call to array_filter there...

return implode($glue, array_filter($ret));
seboettg commented 4 years ago

array_filter works without callback function and checks implicitly for empty values?

seboettg commented 4 years ago

@rudolfbyker okay, please have a look on this again.

rudolfbyker commented 4 years ago

array_filter works without callback function and checks implicitly for empty values?

Exactly. It removes everything that is "falsy" from the array. See https://www.php.net/manual/en/function.array-filter.php .

Everything is working well for my use case now! (As far as I can see … :smile: )

John Anderson and John Brown, Two Authors Writing a Book (Banner of Truth, 1998), 12-17.
seboettg commented 3 years ago

Solved with v2.2.1.