Input error into PMI/Inductive Imputer?

decortja commented 3 years ago

Hi all,

I have been able to use GAIN-GTEx on my data, but I am getting an error when I run the same data on PMI/the inductive imputer. Here are my settings:

{'gpu': 1, 'dataset': 'GTEx', 'pathway': '', 'model': 'InductiveImputer', 'inplace_mode': False, 'sweep': False, 'lr': 0.0001, 'batch_size': 32, 'dropout': 0.2, 'bn': True, 'm_low': 0.5, 'm_high': 0.5, 'epochs': 1000, 'steps_per_epoch': 100, 'patience': 30, 'nb_layers': 1, 'hdim': 3072, 'save': True, 'save_dir': '/data/g_gamazon_lab/jdecorte/GTEx-imputation-main', 'random_seed': 0}
Dataset: GTEx

The full error is pasted at the end, but the key error message is ValueError: Layer model expects 4 input(s), but it received 5 input tensors. As far as I can tell, the 5 inputs its receiving are

a (2 x batch_size x number_of_genes) tensor taking real values. Likely the input genes? If so, why is it a tensor and not a (batch_size x number_of_genes) matrix?
a (batch_size x 32) matrix taking binary values (Z/2). Likely the categorical covariates Cohort, Sex, and Tissue.
a (batch_size x 1) matrix taking binary values (Z/2). Likely the one numerical covariate, Age.
Two (batch_size x number_of genes) matrices over Z/2. Are these the mask and b that sets the pseudomask?

From your paper, you say that the PMI imputer receives 4 inputs -- a table of gene data x, a mask for that data m, numerical covariates r, and categorical covariates q. The imputer then generates a vector b, which operates on m to generate the pseudomask. Would be great to get your input here on troubleshooting. Did the PMI imputer accidentally get passed 2 masks, or did it get passed b along with m? Is it supposed to use m as input and calculate b after?

Full error message:

Traceback (most recent call last):
  File "/....../GTEx-imputation-main/imputation.py", line 87, in <module>
    model, generator = train(config)
  File "/....../GTEx-imputation-main/imputation.py", line 57, in train
    model.fit(generator.train_iterator_MCAR(alpha=alpha, beta=beta),
  File "/home/decortja/anaconda3/envs/imputation_env/lib/python3.9/site-packages/wandb/integration/keras/keras.py", line 124, in new_v2
    return old_v2(*args, **kwargs)
 ...
  File "/....../GTEx-imputation-main/models/base_imputer.py", line 66, in call
    return self.model([x_, cat, num, mask], **kwargs)
  File "/home/decortja/anaconda3/envs/imputation_env/lib/python3.9/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1013, in __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
  File "/home/decortja/anaconda3/envs/imputation_env/lib/python3.9/site-packages/tensorflow/python/keras/engine/input_spec.py", line 200, in assert_input_compatibility
    raise ValueError('Layer ' + layer_name + ' expects ' +
ValueError: Layer model expects 4 input(s), but it received 5 input tensors. Inputs received: [<tf.Tensor: shape=(2, 32, 65801), dtype=float32, numpy=
array([[[-0.3992244 ,  0.8063585 ,  0.79729533, ...,  0.3746732 ,-0.24353398, -0.        ],
        [-0.        ,  1.4937446 , -0.        , ..., -1.1676767 , 0.        ,  0.        ],
        [ 0.49862802, -0.38329843, -0.        , ..., -2.2393956 , 1.9952754 ,  0.        ],
        ...,
        [ 0.        , -0.        , -0.        , ...,  0.        , -0.        ,  1.3122015 ],
        [-0.23860702,  1.2784615 ,  0.        , ...,  0.        , -0.        ,  1.2541623 ],
        [ 0.57231456, -0.        ,  0.        , ..., -1.3396412 , 0.        , -0.        ]]], dtype=float32)>, <tf.Tensor: shape=(32, 3), dtype=int32, numpy=

array([[0, 0, 0],
       [0, 1, 0],
       ...,
       [1, 0, 0],
       [0, 0, 0],
       [0, 0, 0]], dtype=int32)>, <tf.Tensor: shape=(32, 1), dtype=float32, numpy=

array([[-2.35253   ],
       [-0.04534702],
       ...,
       [-1.5834689 ],
       [ 0.56990176],
       [ 0.03155908]], dtype=float32)>, <tf.Tensor: shape=(32, 65801), dtype=float32, numpy=

array([[1., 1., 1., ..., 1., 1., 0.],
       [0., 1., 0., ..., 1., 0., 0.],
       ...,
       [1., 1., 1., ..., 1., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       [1., 1., 1., ..., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: shape=(32, 65801), dtype=float32, numpy=

array([[0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 1., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 1.],
       [1., 1., 0., ..., 0., 0., 1.],
       [1., 0., 0., ..., 1., 0., 0.]], dtype=float32)>]

decortja commented 3 years ago

At the time of the error (line 66 in base_imputer.py), {kwargs} = {'training': False} if that helps.

rvinas commented 3 years ago

Hi @decortja, thank you for spotting this!

I believe this is my mistake - the code for InductiveImputer seems to be obsolete. Could you please check if PMI works with PseudoMaskImputer? I refactored the code ~2 days ago. If this works, I'll remove InductiveImputer from the repo to avoid future confusion. Many thanks again for reporting this issue!

decortja commented 3 years ago

Amazing! PseudoMaskImputer working well. Was the patience set to 30 empirically?

rvinas commented 3 years ago

Yes, it was. Feel free to use other values!

decortja commented 3 years ago

Sounds great. Also not deserving of a separate issue, but where is the output delivered to? I can't seem to find it. Particularly the per-gene imputation R^2 scores, like you have displayed in Figure 5.

rvinas commented 3 years ago

We included a Jupyter notebook eval.ipynb that loads the saved models and produces results. I hope this helps!

rvinas / GTEx-imputation

Input error into PMI/Inductive Imputer? #11