sul-dlss / dlme-transform

Transforms raw DLME metadata to DLME intermediate representation
Apache License 2.0
0 stars 2 forks source link

Update JSON schema to validate that URLs are valid #974

Open jacobthill opened 2 years ago

jacobthill commented 2 years ago

We need a test to ensure that the data harvested from a data provider makes it all the way through our ETL pipeline and into the web application. The data is harvested, transformed in traject, and loaded into the DLME web application. We have test in Airflow to check that the number of records harvested matches the number of records in the Intermediate Representation (IR) after transform. However, there are still cases where traject will not through an error but Spotlight will not like something about a record in the IR. Sometimes this results in an error that surfaces when attempting to load the records into Spotlight but sometimes no error is surfaced and some of the records just don't load into Spotlight. In these cases, the only indication that something went wrong is that the record count in Spotlight doesn't match the record count in the IR. We need a way to compare the record count in the IR to the record count in Spotlight. This might be a browser test, or a Solr query, or maybe both.

thatbudakguy commented 1 year ago

One reliable way to repro this is to try to index a record that has a non-url string value for agg_preview.