julia-vscode / SymbolServer.jl

Other
23 stars 31 forks source link

storage mechanism #155

Closed ZacLN closed 3 years ago

ZacLN commented 4 years ago

Custom storage format. @davidanthoff I don't know anything about serialization so this certainly isn't optimised but is ~6x faster than JSON storage.

davidanthoff commented 4 years ago

Hehe, that is pretty good :)

davidanthoff commented 4 years ago

This will hopefully also fix https://github.com/julia-vscode/SymbolServer.jl/issues/54 and https://github.com/julia-vscode/LanguageServer.jl/issues/750.

davidanthoff commented 4 years ago

@ZacLN also, do you think there is much more to be done here, or is this pretty close and ready?

pfitzseb commented 4 years ago

In the interest of potential other consumers of this: Can we give a JSON3 based solution a go? I feel like JSON (de)serialization shouldn't be this slow.

davidanthoff commented 4 years ago

If we were to use JSON3.jl, I think it would have to be limited to the LS process, right? JSON3 brings along a fair number of deps, so not a good fit for the symbol server process. But I also think that would be ok, because we are mostly worried about deserialization speed, right? It would mean an additional 3 git submodules... We should also make sure that everything in JSON3 works properly with static compilation. Also, the LS process then would depend on both JSON and JSON3, which is a bit weird, but I guess also not the end of the world?

I guess the other consideration is how stable is JSON3's design? Just not clear to me, but I wouldn't want to take a dep on a package that is work/design in progress.

ZacLN commented 4 years ago

I think I must have timed JSON 3 before writing this. From my recollection the interesting part of their approach is that they deparse on demand which doesn't help with our use case

davidanthoff commented 4 years ago

I started reading through the JSON3 docs. Not finished yet, but my thoughts so far: we definitely don't want to use something that is a view into the JSON, I think. That seems not good for our use case because we know that we want to load everything, and more importantly, that means we would have to keep the memory mapped file open on Windows, and that can cause all sort of trouble that I think we should avoid at all cost (assuming that the design is a view into a memory mapped file).

I haven't finished reading through the struct API, but at least the option that requires us to use mutable types I think we should also avoid: I would very much want to keep the option to move to a more and more immutable design in the whole LS space, and that seems to be a pretty fundamental way to prevent that.

Overall, the design of it seems very complicated, involved and low level, and I'm not sure we should take a dep on something like that here...

pfitzseb commented 4 years ago

The nice thing about JSON3 is that it should allow us to skip the whole Dict-handling stuff necessary for JSON.jl -- we can simply go from a String or IOBuffer or whatever to Julia objects. That should give us a massive performance increase (but to be fair I've never tested that, so who knows).

davidanthoff commented 4 years ago

My vote would be to go with this here for now, we can always reconsider later, but this seems to be simple, fast and more or less dependency less.

I think once we start to index in the cloud we might want to also save everything in JSON files, so that if we decide to change file format in the future, we don't have to reindex everything, but could just load the JSON files and write them out to our own cache file format v2.

ZacLN commented 3 years ago

Alright @davidanthoff, @pfitzseb, @aviatesk , this is functional and good to go. Do we want to retain the suffix for these files - jstore - which reads as a pretty crap and uninformative to me. We'll also need to remember to update the persistant storage folder (from 2 to 3) within vscode when the whole vscode story is updated.

davidanthoff commented 3 years ago

Fantastic! We should probably first finish the CSTParser update across all the repos, and then merge this, right?

ZacLN commented 3 years ago

Either way is fine with me. Are we going to introduce this to LanguageServer/vscode (and so into the wild) at the same time as the CSTParser update? We'll likely get a fair few bugs from both but I don't they'll intersect in a way that'll make things more difficult.

davidanthoff commented 3 years ago

I think that primarily depends on our availability to fix things quickly. I think if we all have a bit of time at hand in the next weeks (I do) then we could just push it out all at the same time?

ZacLN commented 3 years ago

Agreed, let's get them out as soon as we can then

davidanthoff commented 3 years ago

Do we want to retain the suffix for these files - jstore - which reads as a pretty crap and uninformative to me.

I'm indifferent :) No one will ever see these, so I don't think it matters much.

We'll also need to remember to update the persistant storage folder (from 2 to 3) within vscode when the whole vscode story is updated.

Yes, I'll make sure to do that when I update the submodules in the VS Code repo!