renanbr / bibtex-parser

BibTex Parser provides an API to read .bib files programmatically.
MIT License
40 stars 17 forks source link

Collision between key `type` (php array) and field `type` (bibtex/biblatex) #65

Closed ninfito closed 4 years ago

ninfito commented 5 years ago

Consider the next test.php

use RenanBr\BibTexParser\Listener;
use RenanBr\BibTexParser\Parser;

require 'vendor/autoload.php';

$bibtex = <<<BIBTEX
@TechReport{Plotkin_1995,
    author = {Serge Plotkin},
    title = {Research in Graph Algorithms and Combinatorial Optimization},
    year = 1995,
    month = mar,
    type = {resreport},
    doi = {10.21236/ada292630},
    url = {https://doi.org/10.21236%2Fada292630},
    publisher = {Defense Technical Information Center}
}
BIBTEX;

$parser = new Parser();          // Create a Parser
$listener = new Listener();      // Create and configure a Listener
$parser->addListener($listener); // Attach the Listener to the Parser
$parser->parseString($bibtex);   // or parseFile('/path/to/file.bib')
$entries = $listener->export();  // Get processed data from the Listener

print_r($entries);

Output from > php test.php | nl

     1  Array
     2  (
     3      [0] => Array
     4          (
     5              [type] => resreport
     6              [citation-key] => Plotkin_1995
     7              [author] => Serge Plotkin
     8              [title] => Research in Graph Algorithms and Combinatorial Optimization
     9              [year] => 1995
    10              [month] => mar
    11              [doi] => 10.21236/ada292630
    12              [url] => https://doi.org/10.21236%2Fada292630
    13              [publisher] => Defense Technical Information Center
    14              [_original] => @techreport{Plotkin_1995,
    15          author    = {Serge Plotkin},
    16          title     = {Research in Graph Algorithms and Combinatorial Optimization},
    17          year      = 1995,
    18      month     = mar,
    19      type      = {resreport},
    20      doi       = {10.21236/ada292630},
    21          url       = {https://doi.org/10.21236%2Fada292630},
    22          publisher = {Defense Technical Information Center}
    23  }
    24          )

    25  )

But line 5 in the output should be

     5              [type] => TechReport

or better

     5              [_type] => TechReport
...
     11             [type] => resreport

Clearly, this is because of the collision between key type (php array) and bib-field type, a legal field in bibtex § 2.2 and biblatex § 4.9.2.13.

Maybe the patches...

# Parser.php
    19    const TYPE = '_type';

and

# Listener.php
    66                $this->entries[] = [Parser::TYPE => $text];
...
    84                    if ('string' === $entry[Parser::TYPE] && array_key_exists($text, $entry)) {

...work, but I'm not sure if this is enough.

renanbr commented 5 years ago

Thanks for reporting and get deep into this library :)

Actually this is an expected behaviour:

Note: This library considers "type" and "citation-key" as tags. This behavior can be changed implementing your own Listener (more info at the end of this document).

From https://github.com/renanbr/bibtex-parser#vocabulary

There is also a test: testTypeOverriding() using type-overriding.bib file.

Note: I'm NOT rejecting in advance a PR that adds an extra field _type to the results, which could be done changing the Listener implementation around these lines.

ninfito commented 5 years ago

Thank you for your reply @renanbr.

Yes, consider this issue as a suggestion for future versions. I think that using type (or any other legal bibtex/biblatex field) as a reserved key for the php-array in Listener::export() is not the best choice.

In a real-world environment you are forced to implement your own Listener in order to prevent this obvious collision, otherwise, a bib-file line as

    type   =    {Qualificação de doutorado}

will delete the BibTeX entry (@misc, @phpthesis, etc) or replace it with the invalid entry @Qualificação de doutorado, maybe this is not an expected behaviour.

renanbr commented 5 years ago

What if we add allowTypeOverriding(false) or allowTagOverriding(true) to the Listener?

renanbr commented 5 years ago

What about adding some methods to the Listener?

class Listener
{
    // ...
    public function setTypeTagName($tagName) {}
    public function setCitationKeyTagName($tagName) {}
    public function setOriginalEntryTagName($tagName) {}
}

ping @ninfito

renanbr commented 4 years ago

To avoid BC break I think we could store the type (this one define like @misc) in both _type and type key. If parser finds a tag type this will not override the _type one.

@TechReport{
    type = {resreport},
    ...
}

@misc{
    ...
}

will generate something like

[
    '_type' => 'TechReport',
    'type' => 'resreport',
    // ...
];

[
    '_type' => 'misc',
    'type' => 'misc',
    // ...
];
renanbr commented 4 years ago

@ninfito, I've tried to fix it with #84 using an insight from @raphael-st

If you have time to review it would be awesome.

Planning to ship this with some new processors.