kitodo / kitodo-presentation

Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
https://kitodo.github.io/kitodo-presentation/
GNU General Public License v3.0
39 stars 45 forks source link

Duplicate records when importing the same xml file a second time #383

Closed BFallert closed 5 years ago

BFallert commented 5 years ago

I get duplicate records when importing the same xml file a second time.

In tx_dlf_documents i get 3 records: title, year and issue. In the Solr Core i see only 2 records (see below)

I import the record with

sudo -u www-data vendor/bin/typo3cms dlf:kitodo:index --doc "https://digi.bib.uni-mannheim.de/periodika/fileadmin/data/DeutReunP_856399094_19360528/DeutReunP_856399094_19360528.xml" --pid=6 --solr=1

cli_dispatch.phpsh was declared deprecated, so i used vendor/bin/typo3cms

Solr Query Result

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*",
      "rows":"100",
      "wt":"json",
      "_":"1568205391184"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"31LOG_0000",
        "uid":31,
        "page":1,
        "thumbnail":"",
        "partof":0,
        "root":0,
        "sid":"LOG_0000",
        "toplevel":true,
        "type":"newspaper",
        "title":"Deutscher Reichsanzeiger und Preußischer Staatsanzeiger",
        "record_id":"856399094",
        "purl":"http://digi.bib.uni-mannheim.de/urn/urn:nbn:de:bsz:180-digper-21105",
        "location":"https://digi.bib.uni-mannheim.de/periodika/fileadmin/data/DeutReunP_856399094_19360528/DeutReunP_856399094_19360528_anchor.xml",
        "urn":"urn:nbn:de:bsz:180-digper-21105",
        "collection":["ubmaprztgh"],
        "title_tsi":["Deutscher Reichsanzeiger und Preußischer Staatsanzeiger"],
        "title_sorting":"Deutscher Reichsanzeiger und Preußischer Staatsanzeiger",
        "type_usu":["newspaper"],
        "volume":"",
        "fulltext":"",
        "timestamp":"2019-09-11T14:22:02.443Z"},
      {
        "id":"31LOG_0001",
        "uid":31,
        "partof":0,
        "root":0,
        "sid":"LOG_0001",
        "toplevel":false,
        "type":"year",
        "title":"1936",
        "location":"https://digi.bib.uni-mannheim.de/periodika/fileadmin/data/DeutReunP_856399094_19360528/DeutReunP_856399094_19360528_anchor.xml",
        "collection":["ubmaprztgh"],
        "title_tsi":["1936"],
        "title_sorting":"1936",
        "type_usu":["year"],
        "thumbnail":"",
        "purl":"",
        "urn":"",
        "volume":"",
        "record_id":"",
        "fulltext":"",
        "page":0,
        "timestamp":"2019-09-11T14:22:02.457Z"}]
  }}

Configuration:

Debian Buster Typo3 8.7.27 MariaDB 10.3.17-MariaDB-0+deb10u1 PHP 7.3.4-2 Kitodo Presentation git master

Typo3 8 and Kitodo Presentation was installed using Composer.

composer.json

{
    "repositories": {
        "0": {
            "type": "composer",
            "url": "https://composer.typo3.org/"
        },
        "1": {
            "type": "path",
            "url": "packages/*"
        },
        "kitodo": {
            "type": "vcs",
            "url": "https://github.com/kitodo/kitodo-presentation.git"
        }
    },
    "name": "typo3/cms-base-distribution",
    "description": "TYPO3 CMS Base Distribution",
    "license": "GPL-2.0-or-later",
    "require": {
        "helhum/typo3-console": "^4.9.3 || ^5.2",
        "typo3/cms-about": "^8.7.10",
        "typo3/cms-belog": "^8.7.10",
        "typo3/cms-beuser": "^8.7.10",
        "typo3/cms-context-help": "^8.7.10",
        "typo3/cms-documentation": "^8.7.10",
        "typo3/cms-felogin": "^8.7.10",
        "typo3/cms-fluid-styled-content": "^8.7.10",
        "typo3/cms-form": "^8.7.10",
        "typo3/cms-func": "^8.7.10",
        "typo3/cms-impexp": "^8.7.10",
        "typo3/cms-info": "^8.7.10",
        "typo3/cms-info-pagetsconfig": "^8.7.10",
        "typo3/cms-rte-ckeditor": "^8.7.10",
        "typo3/cms-setup": "^8.7.10",
        "typo3/cms-sys-note": "^8.7.10",
        "typo3/cms-t3editor": "^8.7.10",
        "typo3/cms-tstemplate": "^8.7.10",
        "typo3/cms-viewpage": "^8.7.10",
        "typo3/cms-wizard-crpages": "^8.7.10",
        "typo3/cms-wizard-sortpages": "^8.7.10",
        "devlog/devlog": "^3.0",
        "dmitryd/typo3-realurl": "^2.3",
        "ubmannheim/ubma_digi_ra_package": "@dev"
    },
    "scripts": {
        "typo3-cms-scripts": [
            "typo3cms install:fixfolderstructure",
            "typo3cms install:generatepackagestates"
        ],
        "post-autoload-dump": [
            "@typo3-cms-scripts"
        ]
    },
    "extra": {
        "typo3/cms": {
            "web-dir": "public"
        },
        "helhum/typo3-console": {
            "comment": "This option is not needed ay more for helhum/typo3-console 5.x",
            "install-extension-dummy": false
        }
    },
    "require-dev": {
        "kitodo/presentation": "dev-master"
    }
}
albig commented 5 years ago

Thank you for the detailed report.

Bevore you continue to debug the problem, please wait for the merge of #382. Without it, master won't index documents correctly. You Solr should contain much more entries (each page is one record and you have 3 documents).

BFallert commented 5 years ago

Problem is fixed! Have more entries in Solr. In tx_dlf_documents i get 3 records: title, year and issue and get no duplicate entries when i indexed the same document a second ... time.