Create data pipeline from weekly threads on Reddit's r/books

internetarchive / openlibrary

One webpage for every book ever published!

https://openlibrary.org

GNU Affero General Public License v3.0

5.19k stars 1.35k forks source link

Create data pipeline from weekly threads on Reddit's r/books #1169

Open cdrini opened 6 years ago

cdrini commented 6 years ago

Creating a carousel that gets data from Reddit might be a way to provide fresh, varying content on the front page. Weekly threads like What Books Did You Start or Finish Reading This Week? have book titles that are relatively easy to parse out.

xayhewalo commented 4 years ago

I'm not sure how'd we'd programmatically pull books from reddit, but I think the idea is cool. Assigning jdlrobson (not tagging per request) as this is front end related.

cdrini commented 4 years ago

I'll take this one; this so much my cup of tea. (The main plan is finding the bold Title, by author text, and then searching solr for title:"title" author:"author" and collecting results. It doesn't have to be perfect).

jdlrobson commented 4 years ago

I'd strongly advise against more carousels until we have the data requested in #2160

cdrini commented 4 years ago

This is something I'm pretty excited about implementing even if its final form factor isn't a carousel. Although I highly doubt I'll have the time to implementing it any time soon :P

xayhewalo commented 4 years ago

@cdrini I've changed the title of this issue to better reflect the end goal.

cdrini commented 4 years ago

Nice! That's exactly what I had in mind :)

RayBB commented 7 months ago

I'm guessing Mek added the "Can it be closed?" label because reddit api is no longer public and we probably don't want to pay for it. Please reopen if you feel otherwise.

RayBB commented 7 months ago

Actually, I was reading and maybe if we are really low volume we can still make these requests for free. But their new dev stuff is still in a waitlist