Open shaye059 opened 3 years ago
Now that I'm thinking about it I'm wondering how this will work since each new article that's read in from the News API needs to be cross referenced to make sure it's not in the database... Maybe creating a hash function from the article title, date and author and using that as a unique ID would be a good solution to this. That way we the ID's can be indexed and searched really quickly.
I'll have to give this some more thought
This is less urgent since the number of articles is extremely small but in order to setup the ETL pipeline it'll probably need a database to write the data into