olivierkes / manuskript

A open-source tool for writers
http://www.theologeek.ch/manuskript
GNU General Public License v3.0
1.79k stars 236 forks source link

Regular interface freezes due to slow saving process #523

Open worstje opened 5 years ago

worstje commented 5 years ago

I hadn't posted an issue about the freezes I have been experiencing before because it was so obvious and I was sure this was reported already, but now that I went looking for them to share some insights, I find that the only issues I'm able to find are old, closed or both of those.

There is no doubt in my mind that the save process is to blame as it matches up with whenever it autosaves with such high regularity.

In particular, I have narrowed down the cause to the revision control. The moment I switched off revisions, the interface freezes disappeared entirely.

Turning it back on (I never shut down Manuskript so it apparently still remembers the history from before), I find that it saves a revisions.xml of 55MB. Mind you, this is with the Smart Remove functionality turned on with every configurable field set to the default that says to only keep one revision per whatever.

Identified causes

We are saving on the UI thread.

Seriously. That shit should be shoved to the background. I realize this won't be super-easy though because we don't want the state to change while saving and endangering data integrity, requiring locking of some kind. However, this would get rid of all saving-related performance complaints ever, even something as stupid as a 100MB-sized revisions file.

The way revisions are saved is super-awkward.

The fact that it takes like 3 seconds to write 55MB worth of data on my machine shows that it is the creating of the XML that is a very critical bottleneck. (Copying a file to the drive in question gets >130MB/s speeds.) Also keep in mind I have turned off the Zip functionality which would still have to be calculated on top of this!

Looking into it a bit deeper, I have no clue what the logic was behind the way revisions are stored.

  1. The way all revisions for all texts are in a single file means that any save process that involves a single modification will instantly require the entire file to be rewritten. One revision file per text being tracked would be the most sensible solution here.

Technically speaking you could still somewhat pull off the single-file-holding-all-revisions model without rewriting all data all the time if we really wanted to, but...

  1. We are using XML! This is a structured file format for interoperability, but it is slow as hell. It doesn't support inserting data at a random position unless you have a XML library that is optimized for that very purpose, and as such requires rewriting the document. And even such an optimized XML library would still be super-awkward, because pretty much every OS is still limited to the wonders of appending to files.. meaning whatever comes after an 'insert' operation still needs rewriting.

Also keep in mind that XML as a file format requires encoding the entire contents of the data stored in it, meaning that any any lesser-than's and greater-than's need to be encoded into XML entities. This makes this even computationally heavier.

  1. The huge amount of processing every save operation implies also increases the opportunities for data loss. We've seen a fair few reports pop up in the issues, and I definitely believe the slow save method for revisions makes this worse. Not only do we save often, but it takes very long, thus making corruption because of a hardware failure or other incident far more likely. The sooner we can stop mucking around with the file we are saving, the less chance Murphy has to interfere.

Conclusion

Let me say I respect the people who have created and contributed to Manuskript greatly. I have absolutely no wish to step on anyones toes with the things I wrote above.

I understand revisions were added for a reason, and that the method chosen allows for an easy-to-access builtin functionality that is cross-platform. So I realize my take above is a very negative one given the problems it solves.

That said, I still believe the current solution is utterly unfit of being in the program by the time a 1.0 version comes around. The presence of its current implementation offers a terrible experience of freezes that cannot be explained away by users because all the revision baggage is in the background for them and utterly unrelated to their actions in that moment.

Recommended fixes

At present I realize that rewriting the saving functionality to be away from the main UI thread is an impossible ask due to the subtle interactions that would bring along in regards to locking the internal data model and any changes in the UI. This is difficult stuff. We would want to do it eventually because any UI freezes, no matter the reason, are bad user experience. Jerky behaviour where the program does not act when it is desired to do something is enough to throw a writer out of their creative flow.

For right now, I would recommend to have a model where every Text (e.g. 'Test.md') has a separate file besides it for revisionist purposes ('Test.md.revisions'). If we want to keep XML for those bits, that is fine assuming that...

  1. The saving mechanism pays proper attention to whether or not a Text was actually changed by the user before trying to write the revision file, and

  2. these generated revision files are cached / re-used (in case of ZIP) so that the XML writing library does not have to recreate them every single time a saving cycle happens.

Key here is that saving the revisions should not eat any more CPU than it has reason to.

With the hundreds of texts in my project, it tries to save hundreds of revision structures with multiple bits and bobs, all of which need to be checked bit by bit so that they can be properly escaped to fit the XML format. Maybe I just have a lot of texts, but I don't expect to be the only one to run to run into these issues.

Workaround for anyone with freezing issues that drive them mad

Disable revisions entirely in the Settings dialog.

It removes the revisions.xml file from your project and gives you a pleasurable, non-freezing working environment. If you are worried about your files, just copy your project and/or all of its files on a regular basis (automate it every hour, for example), and just rely on that if you really need to get some old material.

gedakc commented 5 years ago

Thank you @worstje for raising this issue along with significant detail about your investigation into the problem. Also thank you for trying not to step on anyone's toes. We appreciate that.

Question 1: Does anyone use the revision feature?

My background is in Information Technology and developing code. I use revision control systems, like git, to capture changes when I indicate to do so and I include a description of those changes. I would not use a system that captures changes based merely on time.

I spoke with my spouse, an author, about Manuskript's revision saving feature and she indicated that she also never uses the feature.

Question 2: Should we remove this revision saving feature from Manuskript?

Removing this feature might alleviate some existing reported issues involving poor performance, crashes, and file corruption. Moveover, removing the revision feature should make the code simpler and easier to maintain.

Personally I'm a big fan of the K.I.S.S principle. I do not know why the current revision saving feature was added. Further, I am not aware of other popular editors or word processors that implement such a time based revision saving feature.

@olivierkes if you are reading this I hope you will add your thoughts on this decision.

My vote would be to remove the current time based revision saving feature from Manuskript.

What do you think?

worstje commented 5 years ago

@gedakc Regarding your questions...

  1. Do I use the revision feature?

No. Perhaps an obvious answer, but the feature already became unusable to me before I managed to squeeze in all my existing writing, nevermind afterwards.

  1. Do I think the revision saving feature should be removed?

Not necessarily. I definitely see a potential 'usefulness' for the feature, even if it does not particularly fit how I use Manuskript.

Thus, I am not saying to remove this feature. However, I am saying that the feature, in its current implementation, appears unusable to me. The current way the feature is implemented essentially makes me prefer notepad for all my writing.

Also note that the current implementation essentially puts all the 'revision eggs' into a single basket. If the revisions.xml file gets corrupted for some reason (be it long processing when a power outage happens, disk corruption, whatever) you will lose all the revisions in one fell swoop. The nature of having separate files that are on the filesystem and not rewritten every single time (an argument I realize only works for non-zip) makes these revisions a lot more durable and dependable.

Manuskript has a very long way to go if it wants to even approach the robust nature of the tested code that is the filesystem and how it tends to naturally spread the blocks around the medium.

If I want to rely on some sort of builtin snapshots / revisions, I might instead consider looking towards the Windows 'Previous Versions' feature. This is of course an OS thing and a wholly different beast that is pretty unreliable in whether or not it tracks things, but it does in theory already provide per-file revisions. (I just don't know how to set it up well enough, I suppose! 🤣 )

FYI: I also enjoy the K.I.S.S. approach as well as the change tracking features git and other source control systems offer... all of which I feel are a perfect match to use in conjunction with unzipped Manuskript projects. 😉

gedakc commented 5 years ago

Thanks @worstje for your response. Perhaps a middle ground would be to change the default behaviour such that creating a new project set Keep revisions to disabled. The longer term plan would be to fix, remove, or replace this feature with a third-party revision control system.

Other related issue reports:

Issue #379 - [FEATURE IDEA] Incorporate GIT in Manuskript already exists for incorporating a revision control system.

Issue 274 - saving error had to do with problems saving the revisions.xml file.

TheJackiMonster commented 3 years ago

This issue pretty much sums up what's causing all these freezes, I assume. ^^'