Open smierz opened 6 months ago
That sounds simple enough to add as an extra field and make part of the backfill task :)
I'm imagining 2 scenarios here:
we'll be doing scheduled tasks (daily for all at first, but we can make finer scales per job if need be), so we can do both. Since the db of papers will probably grow pretty quickly, we probably want to schedule that check more rarely since the proportion of retracted papers will be low. we can also schedule tasks on demand, so what we might want to do is schedule that kind of task when a feed it fetched with some kind of debounce - so we are refreshing the papers that we are actually presenting on feeds to make sure that their state is correct.
edit: I'm taking a brief detour to make some more general models for activitypub so we can make feeds better on activitypub and do some of the fun social things with interactive feed generation, but will return to this since, well, i'm doing it for this project: models: https://github.com/p2p-ld/linkml-activitypub db: https://github.com/p2p-ld/pydantigraph api: https://github.com/p2p-ld/fastapi-activitypub
Looking through OpenAlex docs and saw that they have a flag for "is_retracted". Could be a scheduled job (maybe once a month), checking for papers in DB, if the flag was set to true since fetching.
(low prio though)