Caching - Githubissues

BernierCR commented 8 years ago

Parsing is one of the slowest steps of this program. I use very big macromolecules, and the parsing makes my program appear unresponsive. I'd like to minimize that.

1) How do I put up some status messages on the screen?

2) Is it reasonably possible to cache parsed structures? In MATLAB, I save and load in .mat format instead of reading in the pdb every time. Could we do that here in JSON format or something? Although it would have to be gzipped too to minimize transfer time between the server and client.

I could help code this with some direction, I don't know the inner workings of this program yet.

Thanks.

arose commented 8 years ago

Parsing is one of the slowest steps of this program. I use very big macromolecules, and the parsing makes my program appear unresponsive. I'd like to minimize that.

Nice, displaying very big macromolecules is exactly what I want to enable. I hope it only appears to be unresponsive. At least parsing should automatically happen in a WebWorker.

1) How do I put up some status messages on the screen?

NGL.Stage.loadFile returns a Promise object. You can use that to know when the file has been loaded and parsed. You currently cannot get more detailed status info but that would be nice. There is also NGL.Stage.tasks (a NGL.Counter object) which keeps track of pending tasks and fires signals on changes of the task count. This is used to show if there are representations being calculated. I can imagine to include information on file loading and parsing in there too.

2) Is it reasonably possible to cache parsed structures? In MATLAB, I save and load in .mat format instead of reading in the pdb every time. Could we do that here in JSON format or something? Although it would have to be gzipped too to minimize transfer time between the server and client.

There is NGL.Structure.toJSON and .fromJSON (though there needs to be some post processing to make the structure object useful again). However, note that both methods work with plain JavaScript objects and not with the textual JSON which would be way to expensive, especially for large structures. The function are currently used for de-/serialization when passing objects to WebWorkers. When creating textual JSON the TypedArrays (you can get a list of with .getTransferable) within the objects contain must of the data and should be handled separately to create an efficient de-/serialization scheme. I don't have plans to work on that.

That said, I am currently working on a compressed binary format for molecular structures and quite heavily refactor the internal data representations to make everything faster and less memory hungry. This also targets the problem of time consuming parsing you described. I can probably tell you more in the beginning of January.

I could help code this with some direction, I don't know the inner workings of this program yet.

Great, if you want to work on an efficient de-/serialization scheme (as described above) that would be most welcome! I am happy to give you directions. This would also go a long way to create support for sessions and session files.

arose commented 8 years ago

@j0kaso you might be interested in this as well

BernierCR commented 8 years ago

OK great. I'm currently working on other parts of my program, but may come back to work on this performance issue.

arose commented 8 years ago

For very large structures the MMTF format (http://mmtf.rcsb.org/) can be used to radically increase parsing speed.

nglviewer / ngl

Caching #37