Open TobiasNx opened 2 months ago
The same happens with core specific enhancements: https://github.com/hbz/qa-catalogue/issues/13
Did you specified the local version with --marcVersion
? Could you send me the parameters you applied for the validation?
Thanks! It seems OK. The next step would be to write a unit test. If you could add a file with about 1-2 MARCXML records into src/test/resources/marc
of your fork? I would write the unit tests and it helps detecting the errors.
@pkiraly we added three marcxml files in the folder marcxml. Also I added distinct files to hint the issues in the file names.
https://github.com/hbz/qa-catalogue/commit/0358af46c3b532fdc33ef1e2f2e247eeb55d181e
@TobiasNx Thanks! I wrote a unit test against these 3 files, but I was not able to reproduce the error. I found another error though: the ignorableRecords
parameter throw an error when we save the parameters into a JSON file. This might block the success of the validation. Could you check the validation.log
if you find a Java exception close to the end of the file?
I just fixed this issue in the main repository, see https://github.com/pkiraly/qa-catalogue/issues/525.
I can not push the test against HBZ files because I do not have the necessary permission, so I put the code here. Please add it to src/test/java/de/gwdg/metadataqa/marc/cli/ValidatorCliTest.java
:
// add this line to the import section
import java.util.stream.Collectors;
// put it after the last test method
@Test
public void validate_whenHbz() throws Exception {
clearOutput(outputDir, outputFiles);
ValidatorCli processor = new ValidatorCli(new String[]{
"--schemaType", "MARC21",
"--marcVersion", "HBZ",
"--marcxml",
"--outputDir", outputDir,
"--fixAlma",
"--ignorableRecords", "DEL$a=Y",
"--ignorableFields", "964,940,941,942,944,945,946,947,948,949,950,951,952,955,956,957,958,959,966,967,970,971,972,973,974,975,976,977,978,978,979",
"--details",
"--trimId",
"--summary",
TestUtils.getPath("marcxml/990082522550206441_missing_validation_custom_subfield_9_core_710.xml"),
TestUtils.getPath("marcxml/990171082050206441_missing_validation_custom_ind2_9_core_246.xml"),
TestUtils.getPath("marcxml/991000922029706482_missing_subfield_validation_t_in_customfield_GKT.xml"),
});
RecordIterator iterator = new RecordIterator(processor);
iterator.setProcessWithErrors(true);
iterator.start();
List<String> lines = getFileLines("issue-summary.csv");
assertEquals(3, lines.size());
List<String> undefinedFields = lines.stream()
.filter(line -> line.contains("undefined field"))
.collect(Collectors.toList());
assertEquals(0, undefinedFields.size());
// Pattern pattern = Pattern.compile("^\\d+,952,\\d+,\\d+,undefined field");
// assertTrue(pattern.matcher(undefinedFields.get(0)).find());
}
@Phu2 will take care of it.
Could you check the validation.log if you find a Java exception close to the end of the file?
No exceptions found. These are the last 20 lines of processing our whole basedump containing >27 mio records:
qa-catalogue@quaoar4:~/qa-catalogue$ tail -n 20 logs/hbz/validate.log
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998379706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998419706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991013998449706467) invalid category for 007: '|'
Sep 23, 2024 3:27:51 PM de.gwdg.metadataqa.marc.dao.Control007 processContent
SEVERE: #991003042719706480) invalid category for 007: '|'
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.utils.RecordIterator processFile
INFO: Finished processing file. Processed 27,824,613 records.
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: printCounter
Sep 23, 2024 3:27:52 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: Saving summary
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.ValidatorCli afterIteration
INFO: all printing is DONE
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.QACli saveParameters
INFO: Saving configuration to /opt/qa-catalogue/output/hbz/validation.params.json.
Sep 23, 2024 3:32:46 PM de.gwdg.metadataqa.marc.cli.utils.RecordIterator start
INFO: Bye! It took: 01:40:42
I can not push the test against HBZ files because I do not have the necessary permission
Now you have write access (pending invitation).
@pkiraly Updated ValidatorCliTest.java
as suggested, see new branch 14-test-validation
.
$ mvn clean install
results in an error: https://gist.githubusercontent.com/TobiasNx/2d648828520523f7a7460dd555d57688/raw/2272d0c8c98fa6335519eb88c9d093e1adcbab90/qaError
This is what the https://github.com/pkiraly/qa-catalogue/issues/525 fixes. Now I have write permission, so I will fix it today in this branch.
@TobiasNx I pushed the changes. You can try it again.
Thanks, @pkiraly ! mvn test
runs fine without any errors.
@TobiasNx please have a look when you are back from vacation.
See: https://github.com/hbz/qa-catalogue/pull/17
mvn clean install
locally seems to work now and the new pr seems to work too.
Some of the changes in GKT and other hbz specific elements with letters do not validate properly even if they are configured:
https://github.com/hbz/qa-catalogue/blob/275f9cf5717e61d0be3de0efacbcc05ea970a7a0/src/main/java/de/gwdg/metadataqa/marc/definition/tags/hbztags/TagGKT.java#L55-L64