kalisio / krawler

A minimalist (geospatial) ETL
https://kalisio.github.io/krawler/
MIT License
54 stars 13 forks source link

Refactor using only hooks #4

Closed claustres closed 6 years ago

claustres commented 6 years ago

Today we have a mix between hooks, stores and task types not so easy to understand. Not sure if practically possible but we could probably unify everything behind hooks, also to provide something more easily extensible, eg:

We don't need the stores service anymore, hooks will allocate required objects on the fly and can pass it from job to tasks using taskTemplate.

The tasks service will only instantiates "empty" tasks that will trigger the pipeline execution.

claustres commented 6 years ago

We need a way to register new hooks in order to extend krawler, similarly to what is done today on store/tasks types with generators.

If the store service does not hold the data anymore then it should be set on the hook object itself, not just a reference to it by ID. Depending on what stores eg a task requires we should target different property path on the object. It might be different ones when the hook is used on the job (eg templateTask) instead of the task (eg the object itself).

claustres commented 6 years ago

Custom hooks registration is now possible, see https://kalisio.gitbooks.io/krawler/docs/EXTENDING.html.

Hooks to create/destroy stores are also available, see https://kalisio.gitbooks.io/krawler/docs/HOOKS.html#store-source.

claustres commented 6 years ago

News hooks have been added to copy data between store, template tasks, etc. intensively based on options templating.

Store services is required when executing the krawler as an API to avoid recreating them on each job request, which might be long.

I think we can now close this issue because we have a great flexibility.