Closed nobrowser closed 2 weeks ago
tl;dr: I think you can use feed tags for this, details below.
my dream feed reader
Love this phrase :)
it looked as if the user_title attribute might work for that
Indeed, it would not be a good idea to use user_title, since it is in some ways "special"; among others it may be used (together with the feed's original title) to implement title sort (e.g. dropping "The" etc.); this is one of the main reasons I did not deprecate it in favor of tags.
URLs are often long and messy, and author titles can be missing or inconvenient (with spaces, among other things)
This is probably another reason for not using user_title like this, since you may still want "human readable" titles that are different from the original ones, but are also not slugs.
So, if reader were to have an API for exactly this use case, it'd be something separate, like Feed.slug, set_feed_slug(url, slug), get_feed(slug=slug).
That said, I think you can achieve this by using feed tags, since they are indexed.
(For reference, this is somewhat related to https://github.com/lemon24/reader/issues/159#issuecomment-612914956 that discusses having a surrogate key, except I didn't think it would be user-defined there.)
Thanks for the suggestion about tags. I didn't think of using them somehow because of the fact that there can be multiple ones, I'll have to look at the schema again to convince myself I'm not doing something abominable.
Good that we're talking.
-- Ian
that there can be multiple [tags]
The only issue I see with using tags is the race condition mentioned in the code comment, but I think the approach of (accidentally, rarely) deleting both should be acceptable, especially for a single-user system.
BTW (and no pressure! :), if the approach above ends up being OK for you, I'd happy to merge it as a reader plugin, either experimental or built-in (the difference is mainly in the amount of testing). The get/set_feed_slug() and get_feed_by_slug() "methods" can be monkey-patched onto the reader instance as enclosure_dedupe does here.
(In theory I'd be fine with this being built into reader itself too, which would also get rid of the race condition, but I think it'd probably be overkill.)
Hmmm, so no, looking at the database I can't use a tag for what I want, as there's only an index on the composite key (feed, key) -- nothing on the value of the key/tag, and even if I went with the hack of a unique key, linear scan of the feed_tags
table would still be needed. So I really need a new (unique) attribute, which indeed would be a kind of "surrogate key". To avoid any confusion I propose to call it short_name
-- it doesn't have to be a slug as the term is normally used.
Would a PR to this effect be considered?
My suggestion was to use tag keys with get_feeds(tags=[tag])
, with the value being ignored/unused (also see the collapsed code at the end of my first comment); for slug My Feed
, the matching feed tag would be .plugin.slugs.My Feed
.
You are right in that there is a scan involved, since there's no index on the feed_tags key column alone. My results were reasonably fast because I only have a ~small amount of feed tags (177 feeds * ~2 tags/feed). However, indexes on tag keys can and should be added, since they exists for this kind of access pattern; I opened https://github.com/lemon24/reader/issues/359 for this, and may work on it reasonably soon.
Would a PR to this effect [new feed attribute] be considered?
Yes, definitely, but I'm somewhat attached to the "slug" naming, since I think it best conveys the idea that it's used as an identifier – "the part of a URL that identifies a page in human-readable keywords" – and it also matches the publishing term.
Added a cleaned up version of the code in https://github.com/lemon24/reader/issues/358#issuecomment-2439459213 as a plugin recipe.
Closing this for now, feel free to reopen if needed.
Hello, I'm trying to write my dream feed reader with your library, and one obstacle I'm facing now is that I'm used to giving short unique IDs (or names) to feeds and accessing them with that ID. That's because URLs are often long and messy, and author titles can be missing or inconvenient (with spaces, among other things). Initially it looked as if the
user_title
attribute might work for that, but alas, the database schema has no index on the column, so that would be a linear search each time.Then I thought I could add the index myself in my code -- but the
Reader
class provides no direct access to the database connection, so I can't do it with a plugin. I could simply open the database after it's created and add the index behindReader
's back, but that would be really ugly. So, what approach do you think I should take? Should I ask you to add the index :grin: , or is there something else I can do now?Thanks,
-- Ian