sul-dlss / dlme-transform

Transforms raw DLME metadata to DLME intermediate representation
Apache License 2.0
0 stars 2 forks source link

Undertake technical analysis to improve ETL pipeline #680

Closed jacobthill closed 2 years ago

jacobthill commented 3 years ago

As the DLME management team, we need recommendations to improve the existing ETL pipeline to reduce the manual labor required to transform records. This work will need to be contained to a fixed amount of time (say 60 hours maximum, though maybe less). The developer/s doing this work will need to meet regularly with the DLME data manager to consult on challenges, viable solutions, etc. It has been confirmed that we are not able to engineer a new ETL system from the ground up; we will need to look for ways to improve the current system to reduce the manual labor required to operate it. This analysis should generate a report with recommendations for improvement and time estimates (can be relative, e.g. Fibonnaci sequence). The DLME product team will discuss the recommendations and budget limitations to determine which of these recommendations will be implemented in the following work cycle.

One area that should be included is an analysis of the current instrumentation for debugging data transformation errors. In past cases, the DLME data manager has tried to add data, the application has displayed a success message indicating that the data has been added to the queue but the records do not load. The first responder has not always been able to gather meaningful information from the available logs. Can this situation be improved? There is a possibility that DLME will be transferred to QNL and they will need to make sense of the available logs. We need to avoid moving the application without the necessary debugging tools in place.

An outline of work required to automate the ETL pipeline is here.

Beyond the automation work outlined in the above document, we need to think of debugging and/or reporting tools to troubleshoot errors, etc.

aaron-collier commented 2 years ago

Closing as we're now establishing airflow for ETL