sissaschool / sportran

A code to estimate transport coefficients from the cepstral analysis of a (multi)variate current stationary time series -- [FKA "thermocepstrum"]
https://sportran.readthedocs.io
GNU General Public License v3.0
41 stars 15 forks source link

Binary output #37

Open rikigigi opened 4 years ago

rikigigi commented 4 years ago

@lorisercole Right now, the default binary output is a pickle dumped blob that, for a first time user, I think it is difficult to understand. Its content is:

['KAPPA_SCALE',
 'TEMPERATURE',
 'TSKIP',
 'UNITS',
 'VOLUME',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'cepstral_log',
 'j_DT_FS',
 'j_Nyquist_f_THz',
 'j_PSD_FILTER_W_THz',
 'j_cospectrum',
 'j_fcospectrum',
 'j_flogpsd',
 'j_fpsd',
 'j_freqs_THz',
 'j_logpsd',
 'j_psd',
 'jf_DT_FS',
 'jf_Nyquist_f_THz',
 'jf_dct_Kmin_corrfactor',
 'jf_dct_aic_Kmin',
 'jf_dct_kappa',
 'jf_dct_kappa_THEORY_std',
 'jf_dct_logpsd',
 'jf_dct_logpsdK',
 'jf_dct_logpsdK_THEORY_std',
 'jf_dct_logtau',
 'jf_dct_logtau_THEORY_std',
 'jf_dct_psd',
 'jf_flogpsd',
 'jf_fpsd',
 'jf_freqs_THz',
 'jf_logpsd',
 'jf_psd',
 'jf_resample_log',
 'kappa_Kmin',
 'kappa_Kmin_std',
 'units',
 'write_old_binary']

Is it used by anyone or anywhere in the code? Is it safe to change the default binary output to the one equivalent to the human readable one but with numpy arrays?

lorisercole commented 4 years ago

The content of the default bin format is simply an object with those attributes. However, I would also avoid splitting the binary output in many files: it does not make sense.

I think we can simplify this by saving many arrays/variables in a numpy or json file (we need to test this). Like this:

tc_dict = {
    'j': {
        'DT_FS': j.DT_FS,
        'KAPPA_SCALE': j.KAPPA_SCALE,
        'psd': j.psd,
         ...
    },
    'jf': {
        'DT_FS': j.DT_FS,
        'KAPPA_SCALE': j.KAPPA_SCALE,
        'psd': j.psd,
         ...
    },
    ...
}

Or with less-readable code:

tc_dict = {
    'j': {},
    'jf': {},
    ...
}
attrs_to_save = ['DT_FS', 'KAPPA_SCALE', 'psd', ...]
for key in tc_dict.keys():
    for attr in attrs_to_save:
        tc_dict[key][attr] = getattr(locals()[key], attr)

(we should find a smarter solution if the dictionary is more deeply-nested)

Then save it using numpy.save('binary_output.npy', **tc_dict) or json.dump(open('binary_output.json', 'w')).

We will then need functions to reconstruct the Currents objects, etc, from this binary file...

What do you think?