Inconsistent standard deviation calculations

Efrem-Braun commented 2 months ago

I implemented some physical validation tests for a system, and I've found that I get different z scores for the kinetic_energy.distribution test each time I run it, even running on the exact same data. I found that this is because the calculated standard deviation between runs is different.

Here's a minimal reproducible example:

import physical_validation
import numpy as np

ene_filename = 'energies.csv'
num_molecules = 900
num_atoms_per_molecule = 3
num_constraints_per_molecule = 3
num_constrained_translational_dof = 0
num_constrained_rotational_dof = 0
temperature = 300
ensemble = 'NPT'
units = physical_validation.data.UnitData(
    kb=8.314462435405199e-3 / 4.18400,  # in units of kcal/(mol*K)
    energy_str="kcal/mol",
    energy_conversion=4.18400,  # from kcal/mol to kJ/mol
    length_str="A",
    length_conversion=0.1,  # from Angstroms to nm
    volume_str="A^3",
    volume_conversion=1e-3,  # from Angstroms^3 to nm^3
    temperature_str="K",
    temperature_conversion=1,  # from K to K
    pressure_str="bar",
    pressure_conversion=1,  # from bar to bar
    time_str="ps",
    time_conversion=1,  # from ps to ps
)

ene = np.genfromtxt(ene_filename, delimiter=",", names=True, dtype='f8')

system_data = physical_validation.data.SystemData(
    natoms=num_molecules * num_atoms_per_molecule,
    nconstraints=num_molecules * num_constraints_per_molecule,
    ndof_reduction_tra=num_constrained_translational_dof,
    ndof_reduction_rot=num_constrained_rotational_dof,
)

ensemble_data = physical_validation.data.EnsembleData(
    ensemble=ensemble,
    natoms=num_molecules * num_atoms_per_molecule,
    temperature=temperature,
)

simulation_data = physical_validation.data.SimulationData(
    units=units,
    system=system_data,
    ensemble=ensemble_data,
    observables=physical_validation.data.ObservableData(
        kinetic_energy=ene['kinetic_energy'],
        potential_energy=ene['potential_energy']),
)

sigma_mean, sigma_variance = physical_validation.kinetic_energy.distribution(
    data=simulation_data, strict=False)

print("\nTest summary:\n"
      f"z score of mean = {sigma_mean}\n"
      f"z score of width = {sigma_variance}")

When I run this multiple times in a directory with the attached energies.csv file, I get results like:

$ python phys_val.py                                                                                                                                                  
Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate.  The estimate presumes the use of many statistically independent samples.  Tests should be performed to assess whether this condition is satisfied.   Be cautious in the interpretation of the data.
/home/linuxbrew/.linuxbrew/lib/python3.11/site-packages/physical_validation/data/ensemble_data.py:88: UserWarning: NPT with undefined pressure.
  warnings.warn(ensemble + " with undefined pressure.")
After equilibration, decorrelation and tail pruning, 62.94% (9442 frames) of original Kinetic energy remain.
Kinetic energy distribution check (non-strict)
Analytical distribution (T=300.00 K):
 * mu: 1609.64 kcal/mol
 * sigma: 30.98 kcal/mol
Trajectory:
 * mu: 1610.20 +- 0.30 kcal/mol
   T(mu) = 300.11 +- 0.06 K
 * sigma: 30.43 +- 0.22 kcal/mol
   T(sigma) = 294.70 +- 2.18 K

Test summary:
z score of mean = 1.879238156048131
z score of width = 2.4324231915338315

$ python phys_val.py
Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate.  The estim
ate presumes the use of many statistically independent samples.  Tests should be performed to assess whether this condition is satisfied.   Be cautious in the interpretation of the data.
/home/linuxbrew/.linuxbrew/lib/python3.11/site-packages/physical_validation/data/ensemble_data.py:88: UserWarning: NPT with undefined pressure.            
  warnings.warn(ensemble + " with undefined pressure.")                                      
After equilibration, decorrelation and tail pruning, 62.94% (9442 frames) of original Kinetic energy remain.                                                                                  
Kinetic energy distribution check (non-strict)                                                                                                                                                
Analytical distribution (T=300.00 K):                                                                                                                                                         
 * mu: 1609.64 kcal/mol                                                                                                                                                                       
 * sigma: 30.98 kcal/mol                                                                                                                                                                      
Trajectory:                                                                                                                                                                                   
 * mu: 1610.20 +- 0.33 kcal/mol                                                                                                                                                               
   T(mu) = 300.11 +- 0.06 K                                                                                                                                                                   
 * sigma: 30.43 +- 0.19 kcal/mol                                                                                                                                                              
   T(sigma) = 294.70 +- 1.88 K                                                                                                                                                                

Test summary:                                                                                                                                                                                 
z score of mean = 1.722986631809056                                                                                                                                                           
z score of width = 2.8189731393472233

$ python phys_val.py                                                                                                                                                  
Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, this statistical inefficiency may be an underestimate.  The estim
ate presumes the use of many statistically independent samples.  Tests should be performed to assess whether this condition is satisfied.   Be cautious in the interpretation of the data.    
/home/linuxbrew/.linuxbrew/lib/python3.11/site-packages/physical_validation/data/ensemble_data.py:88: UserWarning: NPT with undefined pressure.                     
  warnings.warn(ensemble + " with undefined pressure.")                                                                                                                                       
After equilibration, decorrelation and tail pruning, 62.94% (9442 frames) of original Kinetic energy remain.                                                                                  
Kinetic energy distribution check (non-strict)                                                                                                                                                
Analytical distribution (T=300.00 K):                                                                                                                                                         
 * mu: 1609.64 kcal/mol                                                                                                                                                                       
 * sigma: 30.98 kcal/mol                                                                                                                                                                      
Trajectory:                                                                                                                                                                                   
 * mu: 1610.20 +- 0.32 kcal/mol                                                                                                                                                               
   T(mu) = 300.11 +- 0.06 K                                                                                                                                                                   
 * sigma: 30.43 +- 0.21 kcal/mol                                                                                                                                                              
   T(sigma) = 294.70 +- 2.00 K                                                                                                                                                                

Test summary:                                                                                                                                                                                 
z score of mean = 1.7484365388334226                                                                                                                                                          
z score of width = 2.64349131545304

You can see that for the "trajectory" analysis, I get:

 * mu: 1610.20 +- 0.30 kcal/mol
   T(mu) = 300.11 +- 0.06 K
 * sigma: 30.43 +- 0.22 kcal/mol
   T(sigma) = 294.70 +- 2.18 K

 * mu: 1610.20 +- 0.33 kcal/mol                                                                                                                                                               
   T(mu) = 300.11 +- 0.06 K                                                                                                                                                                   
 * sigma: 30.43 +- 0.19 kcal/mol                                                                                                                                                              
   T(sigma) = 294.70 +- 1.88 K      

 * mu: 1610.20 +- 0.32 kcal/mol                                                                                                                                                               
   T(mu) = 300.11 +- 0.06 K                                                                                                                                                                   
 * sigma: 30.43 +- 0.21 kcal/mol                                                                                                                                                              
   T(sigma) = 294.70 +- 2.00 K

with different standard deviations, even though all used the exact same input data. This changes the z scores up to 0.2, which for my particular case was causing the test to fail sometimes and pass other times.

These reports were obtained using physical_validation v1.0.5 (the current version) and Python 3.11.4 on my Ubuntu machine.

mrshirts commented 2 months ago

@ptmerz would you have time to take a quick look? Is it using bootstrapping or introducing some other randomness to get the uncertainty? We should be providing the same standard error for the same data.

Note that with new simulations, the z-score WILL change by as much as 0.2, so it's difficult make automated decisions. But for the same data, we should be able to provide consistency.

ptmerz commented 2 months ago

Huh, that's unexpected! I'll take a look tonight.

ptmerz commented 2 months ago

Hi @Efrem-Braun, thanks for filing a ticket! Looking at your MWE, it seems you missed a part when copying your script: The actual call to physical_validation.kinetic_energy.distribution is missing, and that makes it difficult to know what exactly is happening here.

My guess would be, however, that you didn't specify the bootstrap_seed. See here: https://physical-validation.readthedocs.io/en/stable/physical_validation.html#physical_validation.kinetic_energy.distribution

bootstrap_seed – Sets the random number seed for bootstrapping (if strict=False). If set, bootstrapping will be reproducible. Default: None, bootstrapping is non-reproducible.

For the kinetic energy distribution test, the errors are calculated using bootstrapping, so if you don't define the seed what you're seeing is actually expected (contrary to my earlier comment...).

Let me know if this solves your issue. If it doesn't I'd appreciate if you could copy a true MWE, including the call to physical_validation.kinetic_energy.distribution. Thank you!

Efrem-Braun commented 2 months ago

My apologies; I didn't copy the last few lines of the script. I just edited the example above so it's now complete.

And you are indeed correct that setting the bootstrap_seed makes the code consistent. Thanks so much for your help!

shirtsgroup / physical_validation

Inconsistent standard deviation calculations #246