Backup and Restore - Githubissues

seadowg commented 3 years ago

Has there been any discussion around the ability to perform backup and restore for a running instance of RSS Box? As far as I can see, currently all the state lives in Redis and so restart (without persistence) would invalidate all your existing feed URLs. Maybe I'm wrong and there's currently a mechanism for retaining these URLs between instances (provided the instances have the same root URL). I haven't looked deeply into if feed URLs are already created from reproducible hashing or anything.

Assuming there isn't currently any way to do this, I could image a JSON back up that outputs paths, original URLs and services could be enough to provide a way to recreate the feeds after a restart.

stefansundin commented 3 years ago

Hello! :)

RSS Box is mostly stateless. The great thing is that most of the state that you care about are the feed URLs that you subscribe to. The ID in the feed URL is actually not an ID that RSS Box generates, it is the user ID for that user on the service itself. So a feed URL on rssbox.herokuapp.com will just work on other RSS Box instances.

Let's demonstrate with an example:

Twitter user GitHub: https://twitter.com/github
The corresponding RSS Box URL: https://rssbox.herokuapp.com/twitter/13334762/github
The path components:
- twitter: the service.
- 13334762: the Twitter user ID for the github user.
- github: the username. This information is a bit extraneous and isn't actually used by RSS Box when fetching the feed data from Twitter. But if you exported your feed URLs from your feed reader, you can use this to distinguish between your different feeds.
- It is also just useful to have both the ID and username (especially since it is free real estate). In the case of Instagram when they closed their API, I was able to keep the feed URLs working since the workaround required use of the username rather than the user ID.
So if you change rssbox.herokuapp.com with rssbox.us-west-2.elasticbeanstalk.com (the second public instance that I host), then you'll get this URL which will give you the same feed and it just works: https://rssbox.us-west-2.elasticbeanstalk.com/twitter/13334762/github

As for the URL data that is stored in Redis, it is simply the short-to-long URL resolution made on links in the feed entries. If that data is cleared then the URLs will be resolved again the next time they appear, so if the short URLs are still valid then the data will automatically be restored. I have actually disabled this feature on rssbox.herokuapp.com because there are some performance problems with it (it was fine for a long time, but that public instance has become popular and the simplest way to keep it working well was to disable this feature). This feature is a bit too eager (it makes an attempt on every URL in the feed, even if it is not a common shortlink domain).

Since the URL resolution feature was implemented, a file system cache was added. It may make more sense to move the storage of the URLs there at some point.

For a while I have wanted to make Redis completely optional, but haven't gotten the time to do that.

Let me know if I didn't completely answer your question or if you'd like more information.

seadowg commented 3 years ago

The ID in the feed URL is actually not an ID that RSS Box generates, it is the user ID for that user on the service itself.

Ah this is the key thing I'd missed. As you guessed, I'd assumed those IDs were generated by RSS Box. So as far as I understand, if I reset my own running instance of RSS Box the URLs I've used in my RSS reader will still work as (long as the base URL is the same) which means that the state I'd want to "backup and restore" really lives in my RSS Reader.

As you point, if you needed to move instances, the process would be pretty simple as you'd just change the base URL on all your feeds (and the paths would still work).

I'll close this as I don't think this is something that needs any work. If I get a chance I might open a PR adding something about this to the README.md.

stefansundin / rssbox

Backup and Restore #52