The workflow by which crawler sources are defined and included in the ingest/harvest process is loosely defined. The storage mechanism needs formal definition also ( See #281 ).
To-Do:
[ ] Define and document crawler data maintenance workflow
How are sources stored and structured? (JSON, CSV, SQL, etc)
How are new sources included?
How is an existing data source modified?
What is the update cycle?
[ ] Create standardized python functions/objects/methods for use across the whole project.
pydantic and/or ORM models to define schema and validation rules
Validation of sources -- define the requirements for a source to be ingestible.
The workflow by which crawler sources are defined and included in the ingest/harvest process is loosely defined. The storage mechanism needs formal definition also ( See #281 ).
To-Do: