Assuming the web-scraping component just stores raw, unprocessed data into a "RawData" table, we will need a separate component that comes later and processes this data, to load it into the "clean" database.
Starts up, does it's job, then shuts down when done.
For clean coding, keep configuration separate from code (e..g. don't hard-code important info that can change, like connection strings, file paths, etc. directly in the code).
Data processing:
Delete any rows from the "RawData" table that have status of "Done" and are over days old
Query rows from the "RawData" table with a status of "Pending"
Loop through the data in each "Pending" row, to identify & filter out duplicates (e.g. don't load the same data multiple times)
Save "clean"/processed data to database
After each raw data row is processed, update it's status to "Done" and set "ProcessedTime" to current time
Assuming the web-scraping component just stores raw, unprocessed data into a "RawData" table, we will need a separate component that comes later and processes this data, to load it into the "clean" database.
Requirements: