hbz / qa-catalogue

QA catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA)
GNU General Public License v3.0
0 stars 0 forks source link

Update hbz specific elements #7 #8

Closed TobiasNx closed 2 months ago

TobiasNx commented 2 months ago

Related to #7 based on the latest basedump test http://quaoar4.hbz-nrw.de:8000/hbz/ I adjusted the element definitions with regard to the last config xml provided that defines the hbz specific marc elements in ALMA here https://service-wiki.hbz-nrw.de/pages/viewpage.action?pageId=698777686

I did systematically add all elements from the xml to qa catalogue but only these that were showing +10.000 errors

@maipet and @Phu2 could you have a look if I am missing something and if okay?

TobiasNx commented 2 months ago

Ahh, i forgot to update the tests.

TobiasNx commented 2 months ago

@Phu2 do you know why mvn clean install results in Failed tests: completeness_pica_groupBy_file(de.gwdg.metadataqa.marc.cli.CompletenessTest) I did not change anything concerning PICA.

Phu2 commented 2 months ago

@Phu2 do you know why mvn clean install results in Failed tests: completeness_pica_groupBy_file(de.gwdg.metadataqa.marc.cli.CompletenessTest) I did not change anything concerning PICA.

I have no clue either. Looking into the code, i think these checks https://github.com/hbz/qa-catalogue/blob/main/src/test/java/de/gwdg/metadataqa/marc/cli/CompletenessTest.java#L305 against the output file completeness.params.json seem to be failing. See our output file from the basedump analysis:

@quaoar4:~/qa-catalogue$ cat output/hbz/completeness.params.json | jq
{
  "args": [
    "/opt/qa-catalogue/input/hbz/marc/baseline.xml.gz"
  ],
  "marcVersion": "HBZ",
  "marcFormat": "XML",
  "dataSource": "FILE",
  "limit": -1,
  "offset": -1,
  "id": null,
  "defaultRecordType": "BOOKS",
  "alephseq": false,
  "marcxml": true,
  "lineSeparated": false,
  "trimId": false,
  "outputDir": "/opt/qa-catalogue/output/hbz/",
  "recordIgnorator": {
    "conditions": null,
    "empty": true
  },
  "recordFilter": {
    "conditions": null,
    "empty": true
  },
  "ignorableFields": {
    "fields": null,
    "empty": true
  },
  "stream": null,
  "defaultEncoding": null,
  "alephseqLineType": null,
  "picaIdField": "003@$0",
  "picaSubfieldSeparator": "$",
  "picaSchemaFile": null,
  "picaRecordTypeField": "002@$0",
  "schemaType": "MARC21",
  "groupBy": null,
  "groupListFile": null,
  "solrForScoresUrl": null,
  "format": "COMMA_SEPARATED",
  "advanced": false,
  "onlyPackages": false,
  "replacementInControlFields": "#",
  "marc21": true,
  "unimarc": false,
  "pica": false,
  "mqaf.version": "0.9.4",
  "qa-catalogue.version": "0.8.0-SNAPSHOT",
  "numberOfprocessedRecords": 27801034,
  "duration": "01:27:58"
}

This test doesn't make any sense to me.