vvoelz / biceps

Bayesian inference of conformational populations
https://github.com/vvoelz/biceps
Other
12 stars 3 forks source link

change yaml format to something else #16

Closed yunhuige closed 6 years ago

yunhuige commented 6 years ago

We need to change from Yaml to some condensed format like Numpy to save time and lower the memory usage. Keep in mind, we don't need it to be human readable but it needs to be a dictionary format then it will still work with the MBAR and plot scripts. Also it will be good if we prepare some scripts that can convert it to a human readable format.

yunhuige commented 6 years ago

What candidates do we have now? Numpy? How about *.csv? @robraddi

robraddi commented 6 years ago

hd5/h5 is another option. Didn't you mention a compressed filetype?

yunhuige commented 6 years ago

@robraddi When we select our candidate we'd better keep three points in our mind: 1/ The smaller the memory and size it requires, the better it is. 2/ Don't worry about if the format is human-readable because: a/ people don't really need to take a look at it inside; b/ even if it is not directly human-readable (like .csv) but as long as there is a way to convert it to a human-readable version, it's fine. (I know .csv could work but not sure about others) 3/ It is important it could work well with our MBAR&plot scripts. Here is the link of the latest scripts: https://github.com/vvoelz/nmr-biceps/blob/master/BICePs_2.0/test_MBAR/MBAR_visualization.py Here are some suggestions for now: 1/ Let's make a list of candidates based on the points I mentioned above. 2/ Let's benchmark them in terms of memory and size. 3/ Find our final best pick!

yunhuige commented 6 years ago

What is the status of your test? Do we have any conclusion yet? @robraddi

robraddi commented 6 years ago

We should keep YAML due to reasons I mentioned to you personally.

yunhuige commented 6 years ago

@robraddi I think I need to reopen this issue until we benchmark all different candidates. Maybe you can use the current example dataset but change it from 1000 steps to some bigger numbers and have a list of saving time and if applicable, memory required by sampling them use the current scripts?

robraddi commented 6 years ago

@yunhuige Okay, so I added a few methods on reading and writing various filetypes. Lets try to run some tests this week on a dataset to see how they compare. : )

robraddi commented 6 years ago

Results from our extensive tests can be found in /BICePs2.0/tests/test_format