internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.12k stars 1.34k forks source link

Aggregate want-to-read counts from works onto authors #9359

Closed RayBB closed 2 months ago

RayBB commented 4 months ago

Problem

Perhaps @cdrini can edit this issue with some more details about how to do this and related tickets.

A clear and concise description of what you want to happen

We want to have author-level want-to-read counts for some work related to wikidata (but also it is nice to have in general). We will use these counts as a proxy for the popularity of the author later when we are going to be showing things like most popular authors by country.

Expected behaviour / screenshots (ex: Figma design screenshots for UI feature)

Additional Context

Proposal & Constraints

What is the proposed solution / implementation?

Is there a precedent of this approach succeeding elsewhere?

Which suggestions or requirements should be considered for how feature needs to appear or be implemented?

Related files

We currently populate the solr author record with another solr search for every author:

https://github.com/internetarchive/openlibrary/blob/82884bbff0ba17ea308f99c73c2a23c47d3db4d8/openlibrary/solr/updater/author.py#L15-L30

We want to update this solr query to also aggregate the want to read/ratings data. We will likely need to switch to use the solr JSON Facet API, which offers easy ways to do things like sum. We'll basically want:

('json.facet', {
    "ratings_count_1": "sum(ratings_count_1)",
    "ratings_count_2": "sum(ratings_count_2)",
    "ratings_count_3": "sum(ratings_count_3)",
    "ratings_count_4": "sum(ratings_count_4)",
    "ratings_count_5": "sum(ratings_count_5)",
    "readinglog_count": "sum(readinglog_count)",
    "want_to_read_count": "sum(want_to_read_count)",
    "currently_reading_count": "sum(currently_reading_count)",
    "already_read_count": "sum(already_read_count)",
})

And then using the results, compute the ratings_average, ratings_sortable and ratings_count by passing in the 1..5 counts to work_ratings_summary_from_counts .

Then overwrite the build method of AuthorSolrBuilder to look like that of the WorkSolrBuilder: https://github.com/internetarchive/openlibrary/blob/72321288ea790a3ace9e36f1c05b68c93f7eec43/openlibrary/solr/updater/work.py#L274-L275

Stakeholders

@cdrini

Note: Before making a new branch or updating an existing one, please ensure your branch is up to date.

benbdeitch commented 3 months ago

Hello, could you assign me to this task? It looks like a lot of fun!

benbdeitch commented 3 months ago

Sorry about the radio silence on this issue. I think I've got a working version of it now, I just need to properly test it.

benbdeitch commented 2 months ago

So, I've gotten it to work as far as I can tell; however, the testing seems to be failing for an unrelated reason. https://pastebin.com/qeKd5U1v

So far as I can tell this has nothing to do with the changes that I have made to the code, and more to do with just the local host's peculiarities.

cdrini commented 2 months ago

Awesome, nice! Open a draft PR and we can check it out later this week!

cdrini commented 2 months ago

Hmm that error might be related to https://github.com/internetarchive/openlibrary/pull/9443 maybe ? Do you still get that error on master? If so please create a new issue for it :+1: