Open kelson42 opened 4 years ago
@audiodude One of a first step would be maybe to integrate its source with the same repository and put it in the docker-compose.yml
?
We'd like to start this task in earnest.
Data ingestion of views, links, and lang links can happen on a monthly cadence to keep parity with download.openzim.org. During this process, we should produce tsv "dump" files in the same format as the current wp1_selection_tools, and preferably upload them to that website. As a side effect, we can also write the data to the page_scores table described in the design doc, since we will already have the calculated values.
@audiodude I see nothing I disagree in your comment. I try to phrase it differently to ensure we are aligned:
all.tsv.zip
custom/*tsv
* `tops/tsv`all.tsv.zip
langlinks.tsv.zip
pagelinks.tsv.zip
pages.tsv.zip
pageviews.tsv.zip
projects.zip
** redirects.tsv.zip
Available at https://github.com/openzim/wp1_selection_tools Using the Python framework.