openactive-archive / conformance-services

Harvests and normalises OpenActive Opportunity feeds to a common representation
MIT License
0 stars 0 forks source link

Configuration of normalising data pipelines #37

Open odscjames opened 4 years ago

odscjames commented 4 years ago

Data Enhancement Which enhancement options available in the pipeline for Stage 2 processing needs to be configurable by the user.

This app, like the last one, uses a system of pipelines that perform certain actions on the data.

Is the work here to enable configuration options so it's easy to turn certain pipes on and off?

Are there other use cases you're looking to meet here?

robredpath commented 4 years ago

Add parametrisation to normalisation / enhancement pipeline

robredpath commented 4 years ago

@rhiaro could you give a quick outline of what a particular pipe might encompass? Is it, for example, carrying out a particular normalisation, or "the geo stuff"?

Do we know if there are any dependencies between pipes that would mean that particular ones can't be disabled without other ones being useless/pointless/unreliable?

rhiaro commented 4 years ago

There will be pipes for "the geo stuff", "the activity tag stuff" and the "organisation stuff" aka the enhancement pipes.

There will be pipes for particular normalisations, but in several cases these are functionally the same, so are merged into one pipe.

There are object types that will need to pass through more than one pipe to be completed, eg. an EventSeries with subEvents that are Events - instead of duplicating the Event normalisation in an EventSeries pipe, we pass it through the EventSeries pipe to slurp the necessary data out of the parent object, then it goes through the normal Event pipe for the rest (or vice versa) - at least this is how some worked last time, but could be rearchitected if pipe dependencies is going to be a problem.

Today I've been thinking about breaking it up a bit so there are pipes for things that are common between all/several pipes, eg. dealing with the presence of invalid fields. However, these could potentially be reorganised as methods on the parent Pipe that all the other pipes can call on instead. It would be helpful to know exactly what sorts of things will need turning on and off to architect this better.

rhiaro commented 4 years ago

Requirements are the ability to turn the enhancement pipes off at runtime, but normalisation pipes don't need this. Dependencies between normalisation pipes should be noted, in case someone alters the code to disable some, but it's not a requirement we need to explicitly support at the CLI.