thedatahub / Datahub-Factory

Datahub::Factory - Transport metadata between Collection Management Systems and the Datahub
Other
2 stars 4 forks source link

Identify & flesh out discrete functionality of the conveyor belt #3

Closed netsensei closed 5 years ago

netsensei commented 7 years ago

With this pull request in the works, we can now flesh out / isolate different functions in separate command classes.

Question: what should does tool do anyway? Answer: it's a "glue tool", which means it brings different Catmandu modules together so you don't have to write boilerplate bash or perl to set up a conveyor belt.

However, there are still discrete business requirements to be met. I can identify these variations of input / output. I can put them into 2 categories.

Push data to a datahub instance.

Export data to a local flat file

We should be weary though "reinventing the wheel" here as Catmandu already does a lot of the heavy lifting out of the box.

netsensei commented 5 years ago

This statement still holds true:

We should be weary though "reinventing the wheel" here as Catmandu already does a lot of the heavy lifting out of the box.

It's extremely hard to build a generic tool. Let's ruthlessly guard the scope here. The tool should work within the context of the Flemish Art Context. So, only add things that can't be done with a vanilla Catmandu.

The main goal of the tool is to quickly set up and maintain robust ETL pipelines within an existing infrastructure. Only add new importer & exporter modules when new applications ar added to the infrastructure.

Use open formats and protocols as best as you can. Treat OAI-PMH as a first class citizen + avoid implementing custom API's of systems in separate modules as much as possible.