hbz / lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD
http://lobid.org/resources
Eclipse Public License 2.0
7 stars 7 forks source link

Make lobid-alma data valid against JSON schema #1340

Open acka47 opened 2 years ago

acka47 commented 2 years ago

https://gist.githubusercontent.com/TobiasNx/007a32d61457dc57e353c5f1cd97a5e0/raw/4e9f525f114c0ab06279d28b2c70854cc5c6cee8/validationError.txt

This is an list of the errors of the test data.

TobiasNx commented 2 years ago

I spottet two errors in the validation process with the test data after running the updated script (#1344 ):

alma/(DE-605)TT050421649.json failed test
[
  {
    instancePath: '/describedBy/resultOf/endTime',
    schemaPath: 'describedBy.json/properties/resultOf/properties/endTime/pattern',
    keyword: 'pattern',
    params: {
      pattern: '(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2})'
    },
    message: 'must match pattern "(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2})"'
  }
]
alma/(OCoLC)945571548.json failed test
[
  {
    instancePath: '/contribution/0',
    schemaPath: 'contribution.json/items/required',
    keyword: 'required',
    params: { missingProperty: 'role' },
    message: "must have required property 'role'"
  }
]

The first error is with every record the second is only with the one so far.

TobiasNx commented 2 years ago

@dr0i /describedBy/resultOf/endTime is created correctly later in the process (when indexing?) in the transformation process it self there is only "dummi" as value added. Therefore it breaks. Any way that we still could validate these?

dr0i commented 2 years ago

That's a "feature" as the test files, at whatever date created, comparable. It might be worth to think about using a valid dummy pattern, e.g. 0000-00-00T00:00:00 . Or you could expand the validator to allow "dummy".

TobiasNx commented 2 years ago

0000-00-00T00:00:00

+1 for that

dr0i commented 2 years ago

Should be fine now. Closing.

TobiasNx commented 2 years ago

Again I run this with the updates from #1344:

$ bash ./validateJsonTestFiles.sh  
Testing version: draft
strict mode: "items" is 1-tuple, but minItems or maxItems/additionalItems are not specified or different at path "type.json"
alma/(CKB)5280000000199164.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT000161712.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT000312236.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT003176544.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT004285445.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT005207972.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT006855611.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT012734833.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT012734884.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT015011399.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT015671602.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT016433929.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT016709661.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT017015300.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT017398609.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT017411546.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT017664407.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT019075404.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT019246898.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT019631849.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT020202475.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT020391499.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)HT020936481.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)TT003907920.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(DE-605)TT050421649.json failed test
[
  {
    instancePath: '/hasItem/0/type/0',
    schemaPath: 'hasItem.json/items/properties/type/items/const',
    keyword: 'const',
    params: { allowedValue: 'Item' },
    message: 'must be equal to constant'
  }
]
alma/(OCoLC)945571548.json failed test
[
  {
    instancePath: '/contribution/0',
    schemaPath: 'contribution.json/items/required',
    keyword: 'required',
    params: { missingProperty: 'role' },
    message: "must have required property 'role'"
  }
]
Test FAILED

Still errors

TobiasNx commented 2 years ago

The item error is due to #1177

TobiasNx commented 2 years ago

List of validation errors: https://gist.githubusercontent.com/TobiasNx/007a32d61457dc57e353c5f1cd97a5e0/raw/4e9f525f114c0ab06279d28b2c70854cc5c6cee8/validationError.txt

TobiasNx commented 2 years ago

List of things that do not validate:

these are the errors that appear while transforming via morph.

TobiasNx commented 1 year ago

At the moment there are three schema problems left:

  1. HT017664407 -> only has no type besides BibliographicRessource, schema requires at least 2 types. Was Periodical in the old transformation but cannot unambiguously be identified as "Periodical" should have another look.

  2. subjects that are no "Concepts" and no "ComplexSubject" are typed as "Keyword" this is unvalid, how can we proceed with that

  3. hasItem is at the moment created from the specific publishing profil elements in a record (MNG, HOL, ITM, POR, etc.) the object itself is typed as Item and the marc-element name. We need to remodel this #1177 and https://github.com/hbz/lobid-resources/issues/1373

TobiasNx commented 1 year ago

Duplicate #1429

TobiasNx commented 4 weeks ago

Pathes that have invalid data after fixing describedBy (#2025 ) now:

/hasItem/*/type/*
/publication/*/publishedBy
/spatial/*/focus/geo/lat
/spatial/*/focus/geo/lon
/spatial/*/source/id
/subject/*
/subject/*/componentList/*
/subject/*/source
/subject/*/source/id
/subject/*/type/*

Spatial source needs also to allow rpb spatial. lat lon cannot be numbers since MF only produces strings, not sure how to handle this

publishedBy seems to be due to an faulty mapping: https://github.com/hbz/lobid-resources/issues/2011#issuecomment-2149685432

TobiasNx commented 3 weeks ago

Concerning the missing subject labels for notations, we should ask ourselves if we drop label as mandatory for skosConcepts or introduce a third type of subject: notations which only need the notation, or we use the notation as fallback label if no label is provided.