Closed glciampaglia closed 6 years ago
It looks like these tables are touched in many different parts. We should break down this task in multiple smaller tasks. @shaochengcheng please list here all the parts that require to be changed, so that we can split the work between the two of us.
Work is done. The update is running under server, let wait and see if there are problems.
The space has been freed on the server, so closing.
Currently Hoaxy extracts all the hyperlinks in each tweet collected from the Twitter stream, and puts them in the
url
table. Hoaxy parses each raw URL so collected and stores the full HTML of each raw URL into theurl
table, along with its canonical URL. This creates a lot of duplicate content, and is not an efficient usage of space.To overcome this, we will alter two tables. We will remove the html column from the
url
table, and add it to thearticle
table, which is the one with the canonical URL of the article. PRIORITY: 2Steps:
[x] pre-update. SQL script to add new columns to table
article
: html, status_code[x] update of hoaxy-backends, mainly affected modules:
[x] post-update. Python script to migrate old tables:
url
andarticle
I (@shaochengcheng) am working on the second steps now. I prefer to handle it sololy, because there are so many small things to take care of.