seanbreckenridge / my_feed

a personal media (movies, tv episodes, video games, albums, listens) feed using HPI
https://sean.fish/feed
MIT License
9 stars 1 forks source link

A personal feed/website using HPI

Live at https://sean.fish/feed/

This uses:

Data Sources:

If not mentioned its likely a module in HPI

I periodically index all my data in the background:

Extracting my_feed.sources.listens.history...
Extracting my_feed.sources.listens.history: 5388 items (took 0.14 seconds)
Extracting my_feed.sources.games.steam...
Extracting my_feed.sources.games.steam: 285 items (took 0.01 seconds)
Extracting my_feed.sources.games.osrs...
Extracting my_feed.sources.games.osrs: 924 items (took 0.03 seconds)
Extracting my_feed.sources.games.game_center...
Extracting my_feed.sources.games.game_center: 141 items (took 0.02 seconds)
Extracting my_feed.sources.games.grouvee...
Extracting my_feed.sources.games.grouvee: 243 items (took 0.15 seconds)
Extracting my_feed.sources.games.chess...
Extracting my_feed.sources.games.chess: 681 items (took 2.98 seconds)
Extracting my_feed.sources.trakt.history...
Extracting my_feed.sources.trakt.history: 15327 items (took 11.51 seconds)
Extracting my_feed.sources.mpv.history...
Extracting my_feed.sources.mpv.history: 13807 items (took 13.67 seconds)
Extracting my_feed.sources.nextalbums.history...
Extracting my_feed.sources.nextalbums.history: 1938 items (took 2.36 seconds)
Extracting my_feed.sources.mal.history...
Extracting my_feed.sources.mal.history: 20865 items (took 3.58 seconds)
Total: 59599 items
Writing to 'backend/data/1644267551.json'

... which then gets synced up and combined into the sqlite database on the backend; all handled by index

That has a front-end so I can view/filter/sort stuff and view the data as an infinite scrollable list

Served with nginx in prod, like:

location /feed/ {
  proxy_pass http://127.0.0.1:4500/feed;
}

location /feed/_next/ {
  # required since the above proxy pass doesn't end with '/'
  proxy_pass http://127.0.0.1:4500/feed/_next/;
}

location /feed_api/ {
  proxy_pass http://127.0.0.1:5100/;
}

Install/Config:

For the python library:

git clone https://github.com/seanbreckenridge/my_feed
pip install -e ./my_feed

... installs my_feed (or python3 -m my_feed)

This uses the HPI config structure (which you'd probably already have setup if you're using this)

To install dependencies for the servers, check the frontend and backend directories.

So, in ~/.config/my/my/config/feed.py, create a top-level sources function, which returns each function:

from typing import Iterator, Callable, TYPE_CHECKING

if TYPE_CHECKING:
    from my_feed.sources.model import FeedItem

def sources() -> Iterator[Callable[[], Iterator["FeedItem"]]]:
    # yields functions, when which called yield FeedItem
    from my_feed.sources import games

    yield games.steam
    yield games.osrs
    yield games.game_center
    yield games.grouvee
    yield games.chess

    from my_feed.sources import (
        trakt,
        listens,
        nextalbums,
        mal,
        mpv,
        facebook_spotify_listens,
    )

    yield trakt.history
    yield listens.history
    yield nextalbums.history
    yield mal.history
    yield mpv.history
    yield facebook_spotify_listens.history

The index script in this repo:

To blur images, my_feed index accepts a -B flag, which lets you match against the id, title, or image_url with an fnmatch or a regex. Those are placed in a file, one per line, for example:

id:*up_2009_*
title:*up_2009_*
image_url:*up_2009_*
id_regex:.*up_2009_.*
title_regex:.*up_2009_.*
image_url_regex:.*up_2009_.*

my_feed has a couple options that have developed over time, to let me ignore specific IDs (if I know they're already in the database), ignore sources which take a while to process (only do those once a week or so):

Usage: my_feed index [OPTIONS] [OUTPUT]

Options:
  --echo / --no-echo           Print feed items as they're computed
  -i, --include-sources TEXT   A comma delimited list of substrings of sources
                               to include. e.g. 'mpv,trakt,listens'
  -e, --exclude-sources TEXT   A comma delimited list of substrings of sources
                               to exclude. e.g. 'mpv,trakt,listens'
  -E, --exclude-id-file PATH   A json file containing a list of IDs to
                               exclude, from the /data/ids endpoint. reduces
                               amount of data to sync to the server
  -C, --write-count-to PATH    Write the number of items to this file
  -B, --blur-images-file PATH  A file containing a list of image URLs to blur,
                               one per line
  --help                       Show this message and exit.

feed_check

feed_check updates some of my data which is updated more often (music (both mpv and listenbrainz), tv shows (trakt), chess, albums), by comparing the IDs of the latest items in the remote database to the corresponding live data sources.

This is pretty personal as it relies on my anacron-like bgproc tool to handle updating data periodically.

So all of these follow some pattern like (e.g. for chess)

feed_check runs once every 15 minutes, so my data is never more than 15 minutes out of date.

Example output:

[I 230921 15:44:15 feed_check:213] Checking 'check_albums'
[I 230921 15:44:18 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=album
[I 230921 15:44:18 feed_check:213] Checking 'check_trakt'
[D 230921 15:44:18 export:32] Requesting 'https://api-v2launch.trakt.tv/users/purplepinapples/history?limit=100&page=1'...
[D 230921 15:44:20 export:46] First item: {'id': 9230963378, 'watched_at': '2023-09-21T08:03:23.000Z', 'action': 'watch', 'type': 'episode', 'episode': {'season': 1, 'number': 1, 'title': 'ROMANCE DAWN', 'ids': {'trakt': 5437335, 'tvdb': 8651297, 'imdb': 'tt11748904', 'tmdb': 2454621, 'tvrage': None}}, 'show': {'title': 'ONE PIECE', 'year': 2023, 'ids': {'trakt': 184618, 'slug': 'one-piece-2023', 'tvdb': 392276, 'imdb': 'tt11737520', 'tmdb': 111110, 'tvrage': None}}}
[I 230921 15:44:20 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=10&ftype=trakt_history_movie,trakt_history_episode
[I 230921 15:44:21 feed_check:213] Checking 'check_chess'
[I 230921 15:44:21 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=10&ftype=chess
Requesting https://api.chess.com/pub/player/seanbreckenridge/games/archives
Requesting https://api.chess.com/pub/player/seanbreckenridge/games/2023/09
[I 230921 15:44:22 feed_check:213] Checking 'check_mpv'
[I 230921 15:44:23 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=listen
[I 230921 15:44:23 feed_check:213] Checking 'check_listens'
[I 230921 15:44:23 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=500&ftype=listen
[D 230921 15:44:25 export:62] Requesting https://api.listenbrainz.org/1/user/seanbreckenridge/listens?count=100
[D 230921 15:44:25 export:84] Have 100, now searching for listens before 2023-09-11 04:39:08...
[I 230921 15:44:25 feed_check:213] Checking 'check_mal'
[I 230921 15:44:25 feed_check:42] Requesting https://sean.fish/feed_api/data/?offset=0&order_by=when&sort=desc&limit=50&ftype=anime,anime_episode
Expired: mpv.history
removed '/home/sean/.local/share/evry/data/my-feed-index-bg'
2023-09-21T15-44-35:bg-feed-index:running my_feed index...
Indexing...

This also has the upside of updating my local data whenever there are any changes to the data sources, which means any scripts using the corresponding HPI modules also stay up to date.