hbz / qa-catalogue

QA catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA)
GNU General Public License v3.0
0 stars 0 forks source link

Some catalogue specific elements do not work properly #14

Open TobiasNx opened 2 months ago

TobiasNx commented 2 months ago

Some of the changes in GKT and other hbz specific elements with letters do not validate properly even if they are configured:

grafik

https://github.com/hbz/qa-catalogue/blob/275f9cf5717e61d0be3de0efacbcc05ea970a7a0/src/main/java/de/gwdg/metadataqa/marc/definition/tags/hbztags/TagGKT.java#L55-L64

TobiasNx commented 2 months ago

The same happens with core specific enhancements: https://github.com/hbz/qa-catalogue/issues/13

pkiraly commented 2 months ago

Did you specified the local version with --marcVersion? Could you send me the parameters you applied for the validation?

TobiasNx commented 2 months ago

https://github.com/hbz/qa-catalogue/blob/275f9cf5717e61d0be3de0efacbcc05ea970a7a0/catalogues/hbz.sh#L14

pkiraly commented 2 months ago

Thanks! It seems OK. The next step would be to write a unit test. If you could add a file with about 1-2 MARCXML records into src/test/resources/marc of your fork? I would write the unit tests and it helps detecting the errors.

TobiasNx commented 1 month ago

@pkiraly we added three marcxml files in the folder marcxml. Also I added distinct files to hint the issues in the file names.

https://github.com/hbz/qa-catalogue/commit/0358af46c3b532fdc33ef1e2f2e247eeb55d181e

pkiraly commented 1 month ago

@TobiasNx Thanks! I wrote a unit test against these 3 files, but I was not able to reproduce the error. I found another error though: the ignorableRecords parameter throw an error when we save the parameters into a JSON file. This might block the success of the validation. Could you check the validation.log if you find a Java exception close to the end of the file? I just fixed this issue in the main repository, see https://github.com/pkiraly/qa-catalogue/issues/525.

I can not push the test against HBZ files because I do not have the necessary permission, so I put the code here. Please add it to src/test/java/de/gwdg/metadataqa/marc/cli/ValidatorCliTest.java:

// add this line to the import section
import java.util.stream.Collectors;

// put it after the last test method
  @Test
  public void validate_whenHbz() throws Exception {
    clearOutput(outputDir, outputFiles);

    ValidatorCli processor = new ValidatorCli(new String[]{
      "--schemaType", "MARC21",
      "--marcVersion", "HBZ",
      "--marcxml",
      "--outputDir", outputDir,
      "--fixAlma",
      "--ignorableRecords", "DEL$a=Y",
      "--ignorableFields", "964,940,941,942,944,945,946,947,948,949,950,951,952,955,956,957,958,959,966,967,970,971,972,973,974,975,976,977,978,978,979",
      "--details",
      "--trimId",
      "--summary",
      TestUtils.getPath("marcxml/990082522550206441_missing_validation_custom_subfield_9_core_710.xml"),
      TestUtils.getPath("marcxml/990171082050206441_missing_validation_custom_ind2_9_core_246.xml"),
      TestUtils.getPath("marcxml/991000922029706482_missing_subfield_validation_t_in_customfield_GKT.xml"),
    });

    RecordIterator iterator = new RecordIterator(processor);
    iterator.setProcessWithErrors(true);
    iterator.start();

    List<String> lines = getFileLines("issue-summary.csv");
    assertEquals(3, lines.size());
    List<String> undefinedFields = lines.stream()
      .filter(line -> line.contains("undefined field"))
      .collect(Collectors.toList());
    assertEquals(0, undefinedFields.size());
    // Pattern pattern = Pattern.compile("^\\d+,952,\\d+,\\d+,undefined field");
    // assertTrue(pattern.matcher(undefinedFields.get(0)).find());
  }
TobiasNx commented 1 month ago

@Phu2 will take care of it.

Phu2 commented 1 month ago

Could you check the validation.log if you find a Java exception close to the end of the file?

No exceptions found. These are the last 20 lines of processing our whole basedump containing >27 mio records:

qa-catalogue@quaoar4:~/qa-catalogue$ tail -n 20 logs/hbz/validate.log 
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998379706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998419706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998449706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991003042719706480) invalid category for 007: '|'
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.utils.RecordIterator processFile
INFO: Finished processing file. Processed 27,824,613 records.
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: printCounter
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: Saving summary
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: all printing is DONE
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.QACli saveParameters
INFO: Saving configuration to /opt/qa-catalogue/output/hbz/validation.params.json.
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.utils.RecordIterator start
INFO: Bye! It took: 01:40:42
Phu2 commented 1 month ago

I can not push the test against HBZ files because I do not have the necessary permission

Now you have write access (pending invitation).

Phu2 commented 1 month ago

@pkiraly Updated ValidatorCliTest.java as suggested, see new branch 14-test-validation.

TobiasNx commented 1 month ago

$ mvn clean install results in an error: https://gist.githubusercontent.com/TobiasNx/2d648828520523f7a7460dd555d57688/raw/2272d0c8c98fa6335519eb88c9d093e1adcbab90/qaError

pkiraly commented 1 month ago

This is what the https://github.com/pkiraly/qa-catalogue/issues/525 fixes. Now I have write permission, so I will fix it today in this branch.

pkiraly commented 1 month ago

@TobiasNx I pushed the changes. You can try it again.

Phu2 commented 1 month ago

Thanks, @pkiraly ! mvn test runs fine without any errors. @TobiasNx please have a look when you are back from vacation.

TobiasNx commented 1 month ago

See: https://github.com/hbz/qa-catalogue/pull/17

mvn clean install locally seems to work now and the new pr seems to work too.