vvoelz / biceps

Bayesian inference of conformational populations
https://github.com/vvoelz/biceps
Other
12 stars 3 forks source link

NPZ dictionary format (from 'arr_0' to multiple arrays with specific names) #57

Closed robraddi closed 4 years ago

robraddi commented 5 years ago

Reminder to change the code for saving traj.npz file.

Need to make changes in PosteriorSampler.py:

def process():

These are the changes that will be made in order to have separate arrays for each results section:

# Improved function for write_results()

import numpy as np
def write_results(outfilename='traj.npz', *args, **kwds):                    
    """Writes a compact file of several arrays into binary format.  
    Standardized: Yes ; Binary: Yes; Human Readable: No;            

    :param str outfilename: name of the output file                 
    :return: numpy compressed filetype                              
    """                                                             

    #np.savez_compressed(outfilename, args, kwds)
    np.savez(outfilename, *args, **kwds)
# How the results are saved:

outfilename='traj.npz'
write_results(outfilename, **{"x": x, "y": y})
file_x = np.load(outfilename)["x"]
file_y = np.load(outfilename)["y"]
robraddi commented 4 years ago

Benchmarking the numpyzcompression (NPZ)

Comparison single array ("arr_0") Vs. multiarray ("trajectory", "sampled_parameters",...)


Protection Factor

10,000,000 Steps, writing to trajectory every step
Write
"arr_0": walltime=05:02:22,mem=128428056kb
multiarr: walltime=05:19:33,mem=115145540kb

Read 
"arr_0": 01:19:35,mem=44644064kb
multiarr: 00:28:52,mem=12750740kb

RESULT:

Testing the unusual size of the traj.npz (23 GB). Why?! Lowering the step size...

PF (huge node) 10 steps

Resources: cput=04:19:07,vmem=106620420kb,walltime=04:22:12,mem=105062180kb

results_ref_normal file sizes are still 23 GB ....

Splitting the file up into individual pieces...

saving = ['rest_type', 'trajectory_headers', 'trajectory', 'sep_accept',
    'grid', 'allowed_parameters', 'sampled_parameters', 'model', 'ref']
  24K  allowed_parameters.npy
  30G  model.npy
 3.1K  ref.npy
  317  rest_type.npy
  24K  sampled_parameters.npy
  371  sep_accept.npy
  413  trajectory_headers.npy
  824  trajectory.npy
  15M  grid.npy

Cineromycin_B (NOE data)

10,000 steps
write
"arr_0": 9.363 s,Traj 184K
multiarr: 8.902 s,

load 
"arr_0": 1.446 s,Traj
multiarr: 1.175 s,Traj
100,000 steps
write 
"arr_0": 9.363 s,Traj 184K
100,000,000 steps
write
"arr_0": 04:15:22,mem=135.7 GB,Traj 1.6 GB
"arr_0": 04:15:32,mem=135.7 GB,Traj 1.6 GB
multiarr: 04:58:38,mem=139.2 GB,Traj 1.4 GB
multiarr: 04:58:--,mem=139.2 GB,Traj 1.4 GB

load
"arr_0": 02:00:19,91278552kb
multiarr: 02:00:23,68376052kb

RESULT:


Overall Result: