pnnl-predictive-phenomics / emll

GNU General Public License v2.0
4 stars 0 forks source link

19 create pytensor rvs from missing datasets #20

Closed augeorge closed 1 month ago

augeorge commented 1 month ago

adds a function (plus tests) to create a pytensor from missing data .

the inputs are:

  1. name for the random variables (RVs)
  2. dataset which includes values all model variables across all conditions. The dataframe should contain floats/ints for observed data, np.inf for unobserved data, and np.nan for variables which should be excluded from the model (i.e. exchange reactions). N rows for each experimental condition x M columns for each model variable
  3. dataframe for standard deviations - should have same shape as the above dataset
  4. dataframe for laplace parameters - values are a tuple (location, scale) for the laplace distribution, should have same shape as the above dataset

If a model variable at a particular condition was observed, then a pymc Normal distribution is created with a unique name, the observed value as the mean and the corresponding value from the input standard deviations dataframe.

If a model variable at a particular condition was not observed, then a pymc Laplace distribution is created with a unique name, and the corresponding laplace parameter values from the input laplace parameter dataframe

If a model variable at a particular condition should be excluded from calculations, then a zero pytensor is created.

The current implementation loops through each row and column and assigns the corresponding RV or zero tensor and then stacks them together.

The stacked tensor is returned at the end.

The tests cover different input data type errors, and 4 conditions:

  1. data is observed for all variables and conditions
  2. data is not observed for all variables and conditions
  3. all variables should be excluded
  4. the data contains a mixture of observations, no observations, and exclusions (realistic case)
augeorge commented 1 month ago

probably can be refactored in another PR to be cleaner and more performant

augeorge commented 1 month ago

added test so the shape and dimension of the returned tensor is the same as the input data - also renamed the function to 'create_pytensor_from_data_naive' since we will probably implement a faster method later

augeorge commented 1 month ago

added test to check that 'll.steady_state_pytensor' runs without any errors