openzim / wp1

Wikipedia 1.0 engine & selection tools
https://wp1.openzim.org
GNU General Public License v2.0
24 stars 17 forks source link

Add WikiProject Builder model, only in Python not exposed to frontend yet #729

Closed audiodude closed 6 months ago

audiodude commented 6 months ago

In WP1, we have the concept of a Builder, which is a sort of module that knows how to "build" selections from various data sources and parameters. For example, the Simple builder takes a list=['Article 1', 'Article 2', ...] parameter and does some validation before echoing back the list as a selection. The SPARQL Builder sends a SPARQL query to the Wikidata API and formats the response into a selection.

Note that in the codebase, "builder" is also the name given to a user's specific instance of a Builder, with the parameters instantiated and saved in the database.

In this PR, we introduce the WikiProject Builder, to create a selection based on the articles that are part of one or more WikiProjects. WP1 already has a mapping of WikiProjects -> articles, based on the work of the WP1.0 Bot. The WikiProject Builder takes two parameters:

add - A list of WikiProjects whose articles should be added, calculated as the articles that are a unique set of articles from the union of these projects

subtract - A list of WikiPorjects whose articles should be excluded, again calculated as the articles that are a unique set of articles from the union of the projects.

The end result is the set difference between the articles generated by add and those generated by subtract. So the full formula is:

For A, B in ADD; C, D in SUBTRACT:
(A ∪ B) - (C ∪ D)

Additionally, the validation step confirms that every WikiProject specified in either parameter actually exists in the database.

Finally, we add a new parameter that AbstractBuilder sends to it's subclasses, wp10db=, which is the database connection to the application database. This is possible because the build and validate methods have a contract where they accept a **params list of parameters, so additional keyword arguments can be added freely and are ignored by Builder implementations that are not expecting them.

As part of this system, the b_params field of the builders table in the database is a JSON encoded string that represents the params dictionary. So in the case of the SimpleBuilder mentioned above:

b_params == '{"list": ["Article 1", Article 2"]}'

This system makes it easy to store arbitrary parameters that a Builder might need, and pass them to the Builder implementation at runtime without any knowledge of a "schema".

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 90.95%. Comparing base (425e803) to head (1c85146).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #729 +/- ## ========================================== + Coverage 90.86% 90.95% +0.09% ========================================== Files 63 64 +1 Lines 3348 3394 +46 ========================================== + Hits 3042 3087 +45 - Misses 306 307 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.