kensanata / mastodon-archive

Archive your statuses, favorites and media using the Mastodon API (i.e. login required)
https://alexschroeder.ch/software/Mastodon_Archive
GNU General Public License v3.0
358 stars 33 forks source link

Database rotation #31

Closed seanlynch closed 3 years ago

seanlynch commented 5 years ago

As far as I can tell, the database will just grow without bounds, even if one expires old posts. This is what I want, since I am using mastodon-archive to archive old posts before expiring them. But the file will become unwieldy, and I imagine the program will eventually start running out of memory.

I'm not sure what the best approach is. I'm thinking that what I'd like is for the monotonically growing bits like statuses, favorites, mentions, and media to be split up somehow by date, with periodic snapshots of the other data. Since the data is already in a pretty simple format, this is already pretty easily doable with external tools, but it seems like the sort of thing that should be built into the software itself.

kensanata commented 5 years ago

Yes, absolutely.

kensanata commented 5 years ago

3accf4d adds a split command. Sadly, the remaining commands don't know how to handled a split archive. Currently, searches will only work for the current data file; HTML exports will only be created for the current data file, and so on. Ideally, these commands would open one data file after another and do their work.

lapineige commented 4 years ago

Could we have a "join" command then ? This would allow to run a command into the whole archive if needed.

kensanata commented 4 years ago

Shouldn’t we use an option instead that controls whether older files are read for all the commands? Or perhaps we need to think about the commands we would like to run? Perhaps a simple search-all-files is enough.

kensanata commented 3 years ago

I think the --combine option introduced in #63 closes the problem. If anybody disagrees, feel free to reopen.