CrossRef `/works` endpoint is not guaranteed to return only papers

sneakers-the-rat / paper-feeds

A FastAPI web server for creating RSS feeds for scholarly journals with the magic of adversarial interoperability

GNU General Public License v3.0

63 stars 4 forks source link

Found a journal that returns itself in /journals/<issn>/works

http://api.crossref.org/journals/1674-9251/works?rows=1&offset=566&sort=published

This breaks the app because the journal doesn't have a published date, but ideally the type should be checked before trying to parse entries as papers.

Haven't found a list of possible types right away, but it could be that some of them won't map well to papers either. IMO the most sensible way to deal with this should be to use a CrossRef client library that already deals with this.

Yeah I couldnt find the list of possible types either, and agree doing a filter will be necessary. It looks like types might be its own api endpoint? More generally we'll want to just try/catch and log failed papers.

Trying to keep deps low, and the one package I saw (habanero) looks well written but wasnt sure what it would add here since it mostly just doing what were doing and making a request: https://github.com/sckott/habanero/blob/30b7e93601cdf7e32e58472b70b241323557121b/habanero/crossref/crossref.py#L374C25-L374C25

With some nice wrapping to it. It only depends on requests and tqdm tho, so I am not opposed to swapping out my janky requests with their better ones

edit: oh duh the types are here: http://api.crossref.org/types

sneakers-the-rat / paper-feeds

CrossRef `/works` endpoint is not guaranteed to return only papers #16