Constantly writing to IndexedDB when a task is running is bad for SSDs

RationalFragile commented 1 year ago

Your Environment

Version used: 7.11.6
Operating System and version: Windows 10
Desktop Environment:
Browser Name and version:

Expected Behavior

It should persist changes of course but not write them to disk every second. Something like once per minute when a task is running and on stop/start is more reasonable.

Current Behavior

It constantly writes to files like "userdatadir\IndexedDB\file__0.indexeddb.blob\1\00" but only when a task is running. And because tasks run for hours and hours, process explorer is showing that superproductivity wrote more than 3 times what my chrome writes. (And I noticed my SSD lifespan shrunk from a projected 13 years to 10 years in less than 6 months, and superproductivity is the program that writes the most to disk, after "System" in Process Explorer. I moved it to my HDD but still, I think this should be changed.)

Steps to Reproduce (for bugs)

Run something like Procmon (Process Monitor) and select only disk activity for process name "superProductivity"
Or open process explorer or task manager and highlight superproductivity after showing disk writes
Start a task and notice how it's constantly writing to disk
Stop task and the writing stops

Can you reproduce this reliably?

Yes

Console Output

Error Log (Desktop only)

EDIT: This can be clearly seen in the electron devtools, when a task is running, you get multiple "[Persistence] Save to DB" events every second.

github-actions[bot] commented 1 year ago

Thank you very much for opening up this issue! I am currently a bit overwhelmed by the many requests that arrive each week, so please forgive me, if I fail to respond personally. I am still very likely to at least skim read your request and I'll probably try to fix all (real) bugs if possible and I will likely review every single PR being made (please, give me a heads up if you intent to do so) and I will try to work on popular requests (please upvote via thumbs up on the original issue) whenever possible, but trying to respond to every single issue over the last years has been kind of draining and I need to adjust my approach for this project to remain fun for me and to make any progress with actually coding new stuff. Thanks for your understanding!

johannesjo commented 1 year ago

If you want to store your data and have it available even when crashes occur, you need to write it. I don't see any way around this. One could introduce a throttleling mechanism, but that would raise a ton of complicated problems.

RationalFragile commented 1 year ago

Thank you @johannesjo but I'm sorry, that's definitely not normal. It literally consumes ~25% of "active time" on my HDD, constantly, for hours, as long as task is running.

As I said in the issue, here is the sensible way to save data:

On stop/start tasks
On other operations like saving settings
Every 1-5 minutes while a task is running. Even if it crashes, at worst you'll lose 1-5 minutes that were recorded.
On quit.

No other program ever writes to disk every second for hours.

RationalFragile commented 1 year ago

Just to make the point very clear, the app is saving 1.1GB every hour if I let it running, even with "automatic backups" disabled.

I have been trying to find a way to fix this. I'm not familiar with ngrx but the line _cmBase.saveState in persistance.service.ts gets called every second. Issue is, the call to taskAdapter.updateOne( ... changes: { timeSpentOnDay: doesn't end up passing those changes to the saveState call, it just resaves everything as far as I can tell. So I can't just "buffer" the saveState calls, also because the loadState gets called every second as well, which doesn't make sense since the code already has an up-to-date state so why would it need to read from databasae the whole state (all entities) after every saveState?

lrq3000 commented 1 year ago

I'm surprised no one noticed before, this is a huge issue, the I/O is way too intense, this will cause many problems to a lot of users down the line.

I would add that the other way to persist change of running tasks while being sparse with I/O is actually to save in IndexedDB the start datetime the last time the task ran. This way, you don't need to actively store any data over time: implicitly, as long as there is a start datetime, it means the task is still running. The app can even be closed or crash, the state won't change and will be resumed next time the app is open. Then, when you stop the task, Super Productivity only needs to check current time (end datetime) minus start datetime to count the length of time and add to the previous task duration (if any) - and add to the parent project etc to update progress bars etc.

So it's just a matter of inversing the logic IMHO, and the app could be very much more sparse both in I/O and battery consumption.

RationalFragile commented 1 year ago

Thank you @lrq3000. At first I was gonna say it's a rare inconvenience if it crashes and I forget to restart it to stop the task, but then I remembered that while doing a task, I always look at the tray to confirm if I'm running or not (and shortcuts), so this is actually perfect, if it crashes, I'll restart it when I stop the task and not see it stop, and that will make it not miss any time as well.

But at this point I'm willing to bare any other inconvenience to fix this issue. I'm busy with work and I rely on super productivity and I have a loooot of tasks which makes every second save bigger in size (10GB on my HDD in one day worth of tracked work). I'm literally obliged to continue writing this much daily because I can't switch right now and there are no good alternatives, so if anyone makes any temporary fix pleaaaaaase share the commit with me (I can build from source).

github-actions[bot] commented 1 year ago

This issue has not received any updates in 90 days. Please comment, if this still relevant!

RationalFragile commented 1 year ago

very relevant 😞

lrq3000 commented 1 year ago

@johannesjo Do you need any help with the logic? I'm not a typescript coder, but I have coded lots of timetracking systems, maybe I can help?

hugaleno commented 1 year ago

@johannesjo I think it's just a matter of changing TRACKING_INTERVAL to be a configurable option instead of a constant? It's there something i'm not seeing? If so I can make the changes here and submit a pull request.

RationalFragile commented 1 year ago

@hugaleno Ideally, there should be a variable "session_start_time" that is persisted and holds the current session start time in the task object and a variable "elapsed_time" that is not persisted and gets recalculated every second to update the UI only. If you only change the TRACKING_INTERVAL, you'll be very limited in how much disk writes you can save and still get updates in the UI. For example, setting the interval to 60000ms will still result in 11GB/60 = 188MB per 10 hours of tracking AND will result in all tasks only updating once a minute... So yeah, not ideal but I guess beggars can't be choosers 🤷‍♀️ (Tho, you might say 188MB a day is not much, but remember, the rate of writes grows with your projects (last time I checked)! so if I double the amount of tasks and projects I have, I'll be writing 376MB per day or so even with one update per minute...)

hugaleno commented 1 year ago

@RationalFragile I agree that is not the ideal solution but it can save a lot of writings to disk. About your solution, even if we track the time into memory, we still have to save it periodically to avoid losing tracked time in case of something unexpected occur, like a crash or something like that. Maybe if we could update only the task instead of the entire project into indexeddb would be the ideal solution(no risk of losing any tracked time /UI updates/low disk usage). Not sure if it's possible though.

RationalFragile commented 1 year ago

@hugaleno

we still have to save it periodically to avoid losing tracked time in case of something unexpected occur

No! If the app crashes, you'll reopen it (when you'll want to stop the task and see that the app has crashed) and the task will still be remembering when it started and therefore you won't lose any progress! You really only want to save the task when the session starts and when the session ends.

Maybe if we could update only the task instead of the entire project into indexeddb would be the ideal solution

You can't fix that unless you follow the method I'm suggesting (actually, suggest by lrq3000 above and I validated it by making a quick C# app that I'm using while waiting for this to be fixed).. Because, the app needs to change and save the cumulative time spent on parent project/task (for subtasks) for example; again, because it's not saving the start time in the task. By contrast, if it only saves the start time and then keeps recalculating all the affected elapsed times, it won't save anything during the refreshes... Hope that makes sense.

And I'm just trying to elaborate on why (from a logical perspective) we should only save data that is not transient, the elapsed time is a transient property (look at how Stopwatch is implemented in C# for example, would it make sense to update it every millisecond (busy counting) or simply keep the start value and calculate the elapsed time when requested?) but anyway, if there is an easy fix with just a few lines, of course, might as well try it!

hugaleno commented 1 year ago

@RationalFragile I got the point, but I see some things to take into account.

1 - Today the current task is a in memory information, so if the app closes unexpectedly , when it reopen there will be no running task. 2 - Superproductivity tracks time on a day base. I can have the time spent for different days. What happens if I start working at a task at 11:00 PM and stop at 02:00 AM? Need to handle that and register 1 hour for one day and 2 hours for the other day. 3 - When AFK, it unset the current task. Need to register spent time when it occurs. 4 - When closing the app need to stop the current task and register the time spent.

Just listing what I remember now to do not forget. I think i'm able to implement that, but we need to be careful and take into account the features superproductivity currently have. Maybe there are more points to consider @johannesjo

RationalFragile commented 1 year ago

@hugaleno 1- No, it will read the tasks from the database and see that a task is running because it has a started "session_start_time" which gets set to null when the task is not running 2- Whatever the app does now, it can keep doing. When you finish a session, you then save the total time and persist it but during the session you only save the start time. It doesn't affect days; you keep the same calculations now but operate on the calculated value of elapsed time or whatever variable you need. 3- yes, just like the end of the session, you calculate elapsed, add it to the total time that is persisted. I think I might not have explained clearly: you can save the time elapsed, but you split it on two: persisted_time and transient_time, the elapsed time is the sum, and you add transient_time to persisted_time when you wanna finalize that transient_time and reset the start_time. Hope that clears the confusion. 4- Yes, again, nothing changes, you only change how the "task is closed", when closed, it adds the transient time and resets the start time.

But thank you for listing these things, it's a good thing and it'll make misunderstandings less likely.

(Tho, if I'm the only one complaining about this, then don't do it please, it'll take you time and effort and I already coded some basic time tracking for myself to get me by. If someone fixes this, I might reconsider SuperProductivity, but I might just add more features in my app instead... Again, sorry, not trying to sound rude, just trying to save you from wasting time if I'm the only one wanting this! Thank you 🙏)

hugaleno commented 1 year ago

@RationalFragile Actually I think that this is important to be resolved. That's a lot of I/O and battery usage I was just listing things to be taked into account when implementing such approach

lrq3000 commented 1 year ago

I think you understood very well how to code this @RationalFragile , but to clarify for others: this kind of feature would require only one attribute per task, and one global atttribute:

Attribute per task: cumsum_elapsed_time. This is where we permanently store the cumulative sum of all timetracking (aka timeboxing) we've done for this task. This is already implemented.
Global attribute: datetime_start_of_timetracking. This is the new thing. This is where we store the datetime when we last started to timetrack this task. When done, we can simply tick it off. (Note: this could be made into a per-task field if we wanted to allow timetracking multiple tasks in parallel, but since Super Productivity currently only supports timetracking one task at a time, we can reproduce exactly the same behavior with a single global field, so better keep it simple).

Note also that there is also another global attribute that already exists: id_of_currently_timetracked_task, although it may not be stored at the moment, but in this new approach it needs to be stored. This is important as it needs to be combined with the datetime_start_of_timetracking attribute (see below).

Now, the UI can display a realtime update of transient elapsed timetracking time by simply calculating transient_elapsed_time = now() - datetime_start_of_timetracking, with literally zero I/O and negligible cpu usage/maths calculations.

When the user stops timetracking, then we can again calculate one last time transient_elapsed_time but this time we add it to cumsum_elapsed_time of the id_of_currently_timetracked_task, and we set both datetime_start_of_timetracking and id_of_currently_timetracked_task to null, which means there is no current timetracking of any task. (We can also set transient_elapsed_time = 0 but anyway it should be set with the equation above only during active timetracking, so its value outside of this condition is meaningless anyway and not used!).

And voilà, with just 4 variables (2 stored globals, 1 stored per task, 1 not stored but only used at runtime - transient_elapsed_time) we get a zero I/O timetracking feature, and on top we get for free the ability to continue tracking even when the app is closed/killed, or even synchronize timetracking between devices (just store the globals in the database file that can be synced, and other devices can timetrack the same task simultaneously - you can start timetracking on your smartphone and continue on your computer!).

Sounds like a great feature, no? ;-)

lrq3000 commented 1 year ago

So it should be simple to implement on top of the current codebase, and the codebase is very clean (awesome work by Johannes!), it's just because I suck at Typescript that I cannot do it on my own :-/

lrq3000 commented 1 year ago

PS: also the changes would be retrocompatible: only the way new data is generated is changed, and the end result is the same as before, the only change is that time can then also be tracked while the app is closed, but this behavior can be disabled with a boolean option (if this option is enabled, then when the app is opened, if the global attributes datetime_start_of_timetracking and id_of_currently_timetracked_task are already set then just set them to null).

johannesjo commented 1 year ago

Wow! Thank you everybody! This one of the many reasons why I love open source so much! Strangers on the internet collaborating on a complex problem. Wonderful! :)

Thank you @lrq3000 ! To summarize with my own words: The plan is to use a separate model to do the time tracking stuff and only "flush" it to the task model whenever tracking is stopped, right?

Some things to consider:

Tags and Projects are also affected by time tracking through workStart and workEnd. So the flush would also need to take them into consideration.
I am a bit unsure on how (online) sync needs to be considered. My first gut feeling would have been to exclude the time tracking model from syncing but instead to flush all the changes before every sync. But maybe it's no problem to include this data?

On a side note: In general it would be lovely to have more granular writes, so that when a task is updated only that task would be written. But since this might be huge undertaking, I am not sure if this is realistic.

hugaleno commented 1 year ago

Taking into account everything was said and looking through the code here are my thoughts:

Simple Counters, Take a Break and Tasks uses the same tracking interval mechanism
We need this tracking mechanism to get constantly UI updates
The time calculation that @lrq3000 mentioned the app already does

I don't think we have to change anything except for the part that it saves to the disk after every tracking interval(1 second).

I think one approach we could use is:

We can add a new action that will trigger the persist action only when pausing ,closing the task or when the current task is unset for any reason, that action will also trigger the workStart and workEnd for projects and tags that @johannesjo pointed out.

That leave us with the following problem: when the app crashes or closes unexpectedly we could lose the tracked time. To address this we will persist the id_of_currently_timetracked_task and datetime_start_of_timetracking in the database as pointed by @lrq3000.

When the app starts, we look at this track information:

If they are not null, that means that the elapsed time "was lost"(because the 'persist action' when trigguered will also set that tracking information to null), in this case, we could ask the user(by using a modal) if he wants to add the elapsed time that "was lost" to the task that was running. If yes we calculate the elapsed time( now - datetime_start_of_timetracking), add it to the task and save it, we also set the track information to null on database. if not we just set the track information to null.

The main benefit I see in this approach is:

1 - We don't have to reinvent the wheel and implement everything Johannes already did to do the same thing in a different way 2 - By not changing the tracking time logic itself we avoid side affects that could have. 3 - No constant write to disk(the problem that must be solved)

lrq3000 commented 1 year ago

Sorry it got super busy at work for me, I forgot this discussion tbh!

@johannesjo

To summarize with my own words: The plan is to use a separate model to do the time tracking stuff and only "flush" it to the task model whenever tracking is stopped, right?

I'm not sure about the model part, I don't know the internal architecture of the software, but yes, my suggestion is mostly about making flushing lazy until we need to commit on-disk, but not just that: we also need to make the timetracking variables persistent to ensure we are robust against crashes (and potentially synchronize across devices, more about that below).

Tags and Projects are also affected by time tracking through workStart and workEnd. So the flush would also need to take them into consideration.

Yes no problem, just add the computed interval to those too when timetracking is getting flushed.

I am a bit unsure on how (online) sync needs to be considered.

I'm not sure either, but after some more thoughts since your question, I think it would be easier/less prone to conflicts to synchronize timetracking state rather than force flushing, because if timetracking state is not synchronized, then two different timetrackings can be started on two different devices, and then you get a nasty conflict when trying to synchronize, where you either lose one tracking flush or the other. To avoid this issue, I can't see any other solution than to synchronize timetracking state, and if we do that, then it's unnecessary to force flush anyway.

@hugaleno Thank you very much for having a more precise look in the sourcecode! Your plan sounds awesome! Given what you found, I agree it's unnecessary to rewrite logic that is already there, if just making a few fields persistent and making flush lazy is sufficient to fix this issue then awesome :D BTW could you please point to which file and functions you found these (if you remember)?

hugaleno commented 1 year ago

@lrq3000 Just made a merge request for a initial review with all the changes I made so far, including some debug. Be in mind that it is a WIP

github-actions[bot] commented 10 months ago

This issue has not received any updates in 90 days. Please comment, if this still relevant!

Slime-Senpai commented 8 months ago

This is still relevant

thinktapper commented 7 months ago

Very relevant! I just learned about the programming concept of holding data in a buffer and writing it at once, less often in a JS performance workshop today..and I instantly thought of this specific issue from almost a year ago. Thanks @ThePrimeagen ;)

KAGEYAM4 commented 2 months ago

was this solved or just closed?

i stumbled upon this randomly, and superproductivity is on my 2nd top disk-writes after firefox where i mostly stream videos.

johannesjo commented 2 months ago

If someone has a good suggestion on how to solve this, I am all ears. I am not sure if there is a way around potential dataloss e.g. when the app or system crashes before writing, if we implement some kind of caching. But Ia m no expert on the subject, so input is welcome.

gcqmkm02 commented 1 month ago

was this solved or just closed?

i stumbled upon this randomly, and superproductivity is on my 2nd top disk-writes after firefox where i mostly stream videos.

Jeg har samme problemet. Et det bara med snap?

johannesjo / super-productivity