Handle missing data for input into Bayesian MCA

djinnome commented 9 months ago

All data sucks. Bayesian MCA naively assumes that the matrix of metabolite, flux and enzyme observations has the same size as the stoichiometric matrix rows or columns.

There is a tensor trick to reindex a tensor so that a smaller tensor can represent the observed data and its rows map to the rows of the full stoichiometric matrix, but only ChatGPT knows how this works.

djinnome commented 9 months ago

The current enzyme activity function generates enzymes with Inf values if there is no expression data. The solution to this issue will align the interface to handle missing rows, missing values and missing columns

augeorge commented 8 months ago

I will help with this

augeorge commented 7 months ago

Initial tests to build and pass:

inputs: dataframe: ncond x nvariables outputs: pytensor: ncond x nvariables test that tensor_equal(example_tensor, make_observables( example_df )). Can use data from hackett or Wu et al.

ex1_df: unmeasured everywhere. ex1_tensor: Laplace everywhere w/ shape=ex1_df.shape
ex2_df: measured everywhere - uninformed. ex2_tensor: Normal(mu=0, sigma=1, shape=ex2_df.shape)
ex2_df: measured everywhere - informed. ex2_tensor: Normal(mu=ex2_df.values, sigma=1, shape=ex2_df.shape).
ex3_df= some variables missing - uninformed. ex3_tensor = some columns are Laplace, some columns are (Normal(mu=0, shape=nconditions))
ex3_df= some variables missing - uninformed. ex3_tensor = some columns are Laplace, some columns are (Normal(mu=ex3_df[var], shape=nconditions))
ex4_df = some conditions missing. ex4_tensor = some rows are Normal(0,1) some columns are Normal
ex5_df = ragged array ex5_tensor = pymc_ragged_array

djinnome commented 6 months ago

@ShantMahserejian can we get a dataframe for each data type where the column names are all the conditions, and rows are the model ids for the data type, and a cell is a float if it was measured, a Inf if it isn't measured, and Nan if no measurement can be mapped to the model id (for example, reactions that are not enzyme-catalyzed).

pnnl-predictive-phenomics / syn_bmca

Handle missing data for input into Bayesian MCA #3