Open zenhack opened 3 years ago
Any progress on this? It is kind of important for me.
For that matter is there anything else the scheduled update does besides poll feeds? I'm built a test sandstorm with a restored ttrss grain and it is showing very different disk usage on very similar VPS configurations.
I believe the scheduled task should also scavenge old post content and remove it. But my TTRSS bloats significantly over time, I think that may be somewhat blamable on MySQL in general but I've never dug into what the grain is retaining.
Note that these tasks should still happen while the grain is running, just not on a schedule if you're not opening it periodically. Which is still annoying and which we should still fix.
I think the way I'd want to go about tackling this would be to add an id field to ScheduledJob
in grain.capnp, and have sandstorm de-dup jobs that have the same id -- this way we could just always register the job on startup and rely on it being idempotent (right now it would create a new, separate job on each run).
Can't say when I'd get to it though.
Is there any way jdougan could tamper with the grain backup so it does register for cron when he restores it to a new server?
/me looks back at code I wrote almost two years ago
Looks like it looks for the presence of the file /var/updates-scheduled
to see if it has already registered the job. So if you delete that, it should re-register it on startup.
Since he is only really worrying about moving one grain here, I think, that might be a simple enough solution.
I do wonder if it'd be reasonable to add a button that deletes that file to our TTRSS to reschedule. Grain Settings shows what the server has scheduled, but I'm not sure if it could clean up extraneous events from someone hitting the button multiple times though.
Quoting Jacob Weisz (2022-06-05 22:16:35)
Since he is only really worrying about moving one grain here, I think, that might be a simple enough solution.
I do wonder if it'd be reasonable to add a button that deletes that file to our TTRSS to reschedule. Grain Settings shows what the server has scheduled, but I'm not sure if it could clean up extraneous events from someone hitting the button multiple times though.
The solution I outlined above is little enough work that I don't know if I think a stopgap like this is even worth the trouble (I'm not sure it would even be easier, and it would be a patch we'd have to carry).
Yeah there's no way to either drop or disable cron jobs on Sandstorm's side so revealing that in UI would be dangerous. I'm much more comfortable recommending people delete a file in a grain backup then end up needing to modify their server database.
Would it be reasonable to just not backup the updates-scheduled file?
Not really, Sandstorm wants to backup everything, and that file is a TTRSS-specific choice. But you can delete it out of the backup zip file before restoring it to your new server.
Grain Settings shows what the server has scheduled, but I'm not sure if it could clean up extraneous events from someone hitting the button multiple times though.
Huh? Where is grain setting in the UI?
But you can delete it out of the backup zip file before restoring it to your new server.
And It apparently works if you just shut everything down, remove the file and start everything again.
For that matter, assuming grain IDs are unique, you could just stick the grain ID that created the updates-scheduled file in the file, and when it comes back up check against the current grain id.
There is a gear on the top bar titled Grain Settings. Presently it only shows you if scheduled tasks exist, but more will be added to it later.
The issue with your idea there is that grain IDs are not intended to be visible to the grain itself.
There is a gear on the top bar titled Grain Settings. Presently it only shows you if scheduled tasks exist, but more will be added to it later.
Good. One of sandstorm's major weak points is the admin tools.
The issue with your idea there is that grain IDs are not intended to be visible to the grain itself.
There's no capability so the grain can get the grain id? Does it at least know it is running in sandstorm?
Quoting John Dougan (2022-06-05 23:55:43)
Does it at least know it is running in sandstorm?
I mean, stuff has to be specifically packaged for sandstorm, so the packager can indicate this however they want (including hard-coding it somehow). An environment variable is often a suggested approach, but apps that are written without sandstorm in mind and then ported usually either don't care or need patching.
I'd agree, there's a lot of stuff Sandstorm does or handles that just isn't super visible. Auditing capability was a big goal of Sandstorm originally, so exposing a lot more of that functionality in the UI is something that really excites me.
Generally apps have a SANDSTORM=1 HTTP header passed to them, so apps can easily enable Sandstorm-specific behavior. TTRSS does, it's defined here: https://github.com/zenhack/ttrss-sandstorm/blob/master/.sandstorm/sandstorm-pkgdef.capnp#L122
However, we generally do not want Sandstorm apps to know anything about the "outside world", so generally they cannot see their grain ID, the sharing URLs, etc. (Offer templates, such as used for TTRSS's mobile app support, are in an iframe. The app itself can't see those values.) This helps ensure everything runs through and is secured by the Sandstorm platform.
we generally do not want Sandstorm apps to know anything about the "outside world"
That is a really limiting view of what sandstorm could do. It's like insisting the iOS apps can't ever be aware of what the system configuration looks like. I mean, for security reasons you may decide that a given app should not have that info (GPS, for instance) but no one (except competing GPS vendors) would insist that NO apps should ever have access.
One of the alleged wins with a capabilities system is you can build caps that support special cases without compromising the general case. I would think a capability(ies) to support system identification of some sort would be an obvious place to start. Doesn't have to be a direct grain id, could be some cookie/token that sandstorm gives you that that you can ask sandstone about later to see if it is still valid.
Sure, but adding that to Sandstorm is on par with or more work than what Ian already would preferably do with the scheduling API to solve this. And it'd be arguably better/cleaner for the scheduling API to handle this correctly, over implementing an unrelated system identification API to guess that since the system has changed, updates are probably not scheduled.
I think @ocdtrekkie omitted an implied "by default" -- indeed, especially once the powerbox is more mature we want connecting apps to things to be pretty fluid -- but apps should be isolated by default.
That said, I don't see an enormous problem with the grain knowing its grain id. But I also don't see compelling use cases, and it seems like a good way for apps to break when cloned/restored/etc.
@ocdtrekkie is correct in that the solution I outlined above isn't actually harder than doing this by exposing the grain id. I still think that's the "right" solution: making schedule() idempotent by supplying an id field.
Yeah, that was my intent better stated, thank you.
I think @ocdtrekkie omitted an implied "by default" -- indeed, especially once the powerbox is more mature we want connecting apps to things to be pretty fluid -- but apps should be isolated by default.
Ah. I failed my "read between the lines" roll. The schedule ID change seem reasonable to me, but it isn't like I understand sandstorm internals to any depth. I've looked around but C++ isn't my forte.
The relevant code for this is on the javascript side, actually. If you're interested in taking a whack at it, the main files of interest are:
Ack, actually there is a bit of C++ logic, in supervisor.c++, but it's just forwarding stuff along to the js. Also relevant:
After restarting the updater and letting it go for a bit, the disk bandwidth usage has dropped considerably and more closely resembles the original installation.
Quoting John Dougan (2022-06-05 23:20:47)
Huh? Where is grain setting in the UI?
It's the gear icon in the top bar.
And It apparently works if you just shut everything down, remove the file and start everything again.
Yep, glad you figured it out.
For that matter, assuming grain IDs are unique, you could just stick the grain ID that created the updates-scheduled file in the file, and when it comes back up check against the current grain id.
I don't think we actually have access to the grain id at the point when this is scheduled (it happens during startup, before any requests are handled).
I noticed when working on https://github.com/sandstorm-io/sandstorm/pull/3519 that a cloned grain does not have any scheduled tasks -- this is to be expected as the restoring process doesn't do anything to create them. But this means that if you backup & restore a ttrss grain, or clone one, it will forever fail to fetch feeds in the background. We should try to re-arm the task somehow.
It's possible some support in Sandstorm would make this kind of thing easier to deal with.