Reminder to change the code for saving traj.npz file.

It would be more helpful if we saved multiple arrays with specific names. This would enable us to load a specific array at a time to save memory.

Need to make changes in PosteriorSampler.py:

def process():

These are the changes that will be made in order to have separate arrays for each results section:

# Improved function for write_results()

import numpy as np
def write_results(outfilename='traj.npz', *args, **kwds):                    
    """Writes a compact file of several arrays into binary format.  
    Standardized: Yes ; Binary: Yes; Human Readable: No;            

    :param str outfilename: name of the output file                 
    :return: numpy compressed filetype                              
    """                                                             

    #np.savez_compressed(outfilename, args, kwds)
    np.savez(outfilename, *args, **kwds)

# How the results are saved:

outfilename='traj.npz'
write_results(outfilename, **{"x": x, "y": y})
file_x = np.load(outfilename)["x"]
file_y = np.load(outfilename)["y"]

Benchmarking the numpyzcompression (NPZ)

Comparison single array ("arr_0") Vs. multiarray ("trajectory", "sampled_parameters",...)

Protection Factor

10,000,000 Steps, writing to trajectory every step

Write
"arr_0": walltime=05:02:22,mem=128428056kb
multiarr: walltime=05:19:33,mem=115145540kb

Read 
"arr_0": 01:19:35,mem=44644064kb
multiarr: 00:28:52,mem=12750740kb

RESULT:

At very large steps the loading of the multi array is much faster due to the very large trajectory section (10 million steps).
The duration of writing is very similar in both sets of code

Testing the unusual size of the traj.npz (23 GB). Why?! Lowering the step size...

PF (huge node) 10 steps

Resources: cput=04:19:07,vmem=106620420kb,walltime=04:22:12,mem=105062180kb

results_ref_normal file sizes are still 23 GB ....

Splitting the file up into individual pieces...

saving = ['rest_type', 'trajectory_headers', 'trajectory', 'sep_accept',
    'grid', 'allowed_parameters', 'sampled_parameters', 'model', 'ref']

  24K  allowed_parameters.npy
  30G  model.npy
 3.1K  ref.npy
  317  rest_type.npy
  24K  sampled_parameters.npy
  371  sep_accept.npy
  413  trajectory_headers.npy
  824  trajectory.npy
  15M  grid.npy

Cineromycin_B (NOE data)

10,000 steps

write
"arr_0": 9.363 s,Traj 184K
multiarr: 8.902 s,

load 
"arr_0": 1.446 s,Traj
multiarr: 1.175 s,Traj

100,000 steps

write 
"arr_0": 9.363 s,Traj 184K

100,000,000 steps

write
"arr_0": 04:15:22,mem=135.7 GB,Traj 1.6 GB
"arr_0": 04:15:32,mem=135.7 GB,Traj 1.6 GB
multiarr: 04:58:38,mem=139.2 GB,Traj 1.4 GB
multiarr: 04:58:--,mem=139.2 GB,Traj 1.4 GB

load
"arr_0": 02:00:19,91278552kb
multiarr: 02:00:23,68376052kb

RESULT:

At low number of steps we a very low discrepancy in the duration of the writing and loading.
At very high number of steps (100 million steps) we see a 40 min difference where the more generalized/shortened/pythonic code is in fact slower than the old code.

Overall Result:

It has been decided that the runtime (at high step size) is of more importance than generalized/shortened/pythonic code.
In the future, we plan to rewrite the code in C++ with cython wrapper functions that will also allow for parallelization. Furthermore, the generalized code will be of greater use for readability and extensibility, prospectively.

vvoelz / biceps

NPZ dictionary format (from 'arr_0' to multiple arrays with specific names) #57

Reminder to change the code for saving traj.npz file.

Need to make changes in PosteriorSampler.py:

These are the changes that will be made in order to have separate arrays for each results section:

Benchmarking the numpyzcompression (NPZ)

Comparison single array ("arr_0") Vs. multiarray ("trajectory", "sampled_parameters",...)

Protection Factor

10,000,000 Steps, writing to trajectory every step

RESULT:

Testing the unusual size of the traj.npz (23 GB). Why?! Lowering the step size...

PF (huge node) 10 steps

Cineromycin_B (NOE data)

10,000 steps

100,000 steps

100,000,000 steps

RESULT:

Overall Result: