stefansundin / rssbox

:newspaper: I consume the world via RSS feeds, and this is my attempt to keep it that way.
https://github.com/stefansundin/rssbox/discussions/64
GNU Affero General Public License v3.0
774 stars 73 forks source link

Backup and Restore #52

Closed seadowg closed 3 years ago

seadowg commented 3 years ago

Has there been any discussion around the ability to perform backup and restore for a running instance of RSS Box? As far as I can see, currently all the state lives in Redis and so restart (without persistence) would invalidate all your existing feed URLs. Maybe I'm wrong and there's currently a mechanism for retaining these URLs between instances (provided the instances have the same root URL). I haven't looked deeply into if feed URLs are already created from reproducible hashing or anything.

Assuming there isn't currently any way to do this, I could image a JSON back up that outputs paths, original URLs and services could be enough to provide a way to recreate the feeds after a restart.

stefansundin commented 3 years ago

Hello! :)

RSS Box is mostly stateless. The great thing is that most of the state that you care about are the feed URLs that you subscribe to. The ID in the feed URL is actually not an ID that RSS Box generates, it is the user ID for that user on the service itself. So a feed URL on rssbox.herokuapp.com will just work on other RSS Box instances.

Let's demonstrate with an example:

As for the URL data that is stored in Redis, it is simply the short-to-long URL resolution made on links in the feed entries. If that data is cleared then the URLs will be resolved again the next time they appear, so if the short URLs are still valid then the data will automatically be restored. I have actually disabled this feature on rssbox.herokuapp.com because there are some performance problems with it (it was fine for a long time, but that public instance has become popular and the simplest way to keep it working well was to disable this feature). This feature is a bit too eager (it makes an attempt on every URL in the feed, even if it is not a common shortlink domain).

Since the URL resolution feature was implemented, a file system cache was added. It may make more sense to move the storage of the URLs there at some point.

For a while I have wanted to make Redis completely optional, but haven't gotten the time to do that.

Let me know if I didn't completely answer your question or if you'd like more information.

seadowg commented 3 years ago

The ID in the feed URL is actually not an ID that RSS Box generates, it is the user ID for that user on the service itself.

Ah this is the key thing I'd missed. As you guessed, I'd assumed those IDs were generated by RSS Box. So as far as I understand, if I reset my own running instance of RSS Box the URLs I've used in my RSS reader will still work as (long as the base URL is the same) which means that the state I'd want to "backup and restore" really lives in my RSS Reader.

As you point, if you needed to move instances, the process would be pretty simple as you'd just change the base URL on all your feeds (and the paths would still work).

I'll close this as I don't think this is something that needs any work. If I get a chance I might open a PR adding something about this to the README.md.