mkusner / grammarVAE

Code for the "Grammar Variational Autoencoder" https://arxiv.org/abs/1703.01925
269 stars 78 forks source link

Running the code #29

Closed nzarnaghi closed 3 years ago

nzarnaghi commented 3 years ago

Thank you so much for your code. I have read your paper and tried to run your code, but many required packages are deprecated. Therefore, I tried to change and update your code so that it could be compatible with python 3.7.6. The code is now updated. Some parts of the code model_eq.py is updated is below:

def conditional(x_true, x_pred):
            #most_likely = K.argmax(x_true)
            most_likely = tf.math.argmax(x_true)

            most_likely = tf.reshape(most_likely,[-1]) # flatten most_likely
            ix2 = tf.expand_dims(tf.gather(ind_of_ind_K, most_likely),1) # index ind_of_ind with res
            ix2 = tf.cast(ix2, tf.int32) # cast indices as ints 
            M2 = tf.gather_nd(masks_K, ix2) # get slices of masks_K with indices
            M3 = tf.reshape(M2, [-1,MAX_LEN,DIM]) # reshape them
            #P2 = tf.mul(K.exp(x_pred),M3) # apply them to the exp-predictions
            P2 = tf.math.multiply(tf.math.exp(x_pred),tf.cast(M3,tf.float32)) # apply them to the exp-predictions
            #P2 = tf.math.multiply(tf.cast(tf.math.exp(x_pred),tf.float64),M3) # apply them to the exp-predictions
            #P2 = tf.div(P2,K.sum(P2,axis=-1,keepdims=True)) # normalize predictions
            P2 = tf.math.divide(P2,tf.math.reduce_sum(P2,axis=-1,keepdims=True)) # normalize predictions
            return P2

        def vae_loss(x, x_decoded_mean):
            x_decoded_mean = conditional(x, x_decoded_mean)
            #x = K.flatten(x)
            #x_decoded_mean = K.flatten(x_decoded_mean)
            x = tf.keras.layers.Flatten()(x)
            x_decoded_mean = tf.keras.layers.Flatten()(x_decoded_mean)

            #xent_loss = max_length * objectives.binary_crossentropy(x, x_decoded_mean)
            xent_loss = max_length * tf.keras.losses.binary_crossentropy(x, x_decoded_mean)
            #kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1)
            #kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1)

            kl_loss = - 0.5 * tf.reduce_mean(1 + z_log_var - tf.math.square(z_mean) - tf.math.exp(z_log_var), axis = -1)
            kl_loss = - 0.5 * tf.reduce_mean(1 + z_log_var - tf.math.square(z_mean) - tf.math.exp(z_log_var), axis = -1)

            return xent_loss + kl_loss

        #return (vae_loss, Lambda(sampling, output_shape=(latent_rep_size,), name='lambda')([z_mean, z_log_var]))
        return (vae_loss, tf.keras.layers.Lambda(sampling, output_shape=(latent_rep_size,), name='lambda')([z_mean, z_log_var]))

The equivalent lines of the previous code are commented. But after running the code, it shows the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'time_distributed_1_target' with dtype float and shape [?,?,?]
[[{{node time_distributed_1_target}}]]

After dealing with the code, I realized that when I comment the line "x_decoded_mean = conditional(x, x_decoded_mean)", the code starts running, but the accuracy will not be correct. In addition, commenting the line "P2=tf.math.divide(P2,tf.math.reduce_sum(P2,axis=-1,keepdims=True)) # normalize predictions", does not remove the error. But replacing "P2 = tf.math.multiply(tf.math.exp(x_pred),tf.cast(M3,tf.float32)) # apply them to the exp-predictions" with "P2=tf.math.exp(x_pred)", removes the error. Hence, it means that the error rises from the "conditional" function and M3. I do not know exactly what this function does. Could you please help me for solving this bug? It seems that this is the only error which prevents running the code. If the code works, I can give you the updated code to put it on github.

Thank you

mkusner commented 3 years ago

Thanks for trying to update the code! First let me say that the fastest way to get the code running will be installing the old packages and running in Python 2.7.

If you really want to upgrade it to Python 3 I can explain what conditional() does. First, x_true and x_pred are the size [batch, MAX_LEN, DIM] where x_true only takes values 0 and 1. The code starts by identifying the locations of x_true that are 1 and creates most_likely which is size [batch, MAX_LEN]. These are the indices of production rules for each element in the batch and each timestep in MAX_LEN. We then flatten most_likely. We then need to know which production rules get masked by the ones indicated in most_likely, this is what gets stored in ix2: the indices of the production rules that are masked. We then use these indices to build the masks, which gets stored in M2. M3 reshapes the masks to [batch, MAX_LEN, DIM] size so they can be multiplied by x_pred. I think the error is because the TimeDistributed or RepeatVector modules have been changed. If you want to look into this more deeply you can but it will be a bit tricky.

If you don't mind using pytorch I'd try using the code here: https://github.com/geyang/grammar_variational_autoencoder I'm not sure if it's python 3 ready but would likely be easier to port!

nzarnaghi commented 3 years ago

Thank you so much for your reply and your help. I tried to use Python 2.7 and the old packages, but it did not work. The Tensorflow package is deprecated and python 2.7 is going to be deprecated in the near future. I am trying to run the Pytorch version which you sent to me, but I have some difficulties for installing the torch for Python 2.7. I am still working on it to see whether I can run it successfully or not.

Based on your comments, I checked the TimeDistributed and RepeatedVector, but it seems that it is right. I copied my code below as well. I commented the original lines in your code so that it could be comparable. Could you please take a look at my code which was upgraded to Python 3.7? The commands are very similar to your code, except that they are updated to support the Python 3.7.and Tensorflow 1.14. My code is as follows for the model_eq.py:

import copy
#from keras import backend as K
#from keras import objectives
#from keras.models import Model
#from keras.layers import Input, Dense, Lambda
#from keras.layers.core import Dense, Activation, Flatten, RepeatVector
#from keras.layers.wrappers import TimeDistributed
#from keras.layers.recurrent import GRU
#from keras.layers.convolutional import Convolution1D
#from keras.layers.normalization import BatchNormalization
import eq_grammar as G
import tensorflow as tf

#masks_K      = K.variable(G.masks)
#ind_of_ind_K = K.variable(G.ind_of_ind)

masks_K      = tf.Variable(G.masks)
ind_of_ind_K = tf.Variable(G.ind_of_ind)

MAX_LEN = 15
DIM = G.D

class MoleculeVAE():

    autoencoder = None

    def create(self,
               charset,
               max_length = MAX_LEN,
               latent_rep_size = 10,
               hypers = {'hidden': 100, 'dense': 100, 'conv1': 2, 'conv2': 3, 'conv3': 4},
               weights_file = None):
        charset_length = len(charset)
        self.hypers = hypers

        # x = Input(shape=(max_length, charset_length))
        x = tf.keras.Input(shape=(max_length, charset_length))
        _, z = self._buildEncoder(x, latent_rep_size, max_length)
        # self.encoder = Model(x, z)
        self.encoder = tf.keras.Model(x, z)

        #encoded_input = Input(shape=(latent_rep_size,))
        encoded_input = tf.keras.Input(shape=(latent_rep_size,))
        self.decoder = tf.keras.Model(
            encoded_input,
            self._buildDecoder(
                encoded_input,
                latent_rep_size,
                max_length,
                charset_length
            )
        )

        #x1 = Input(shape=(max_length, charset_length))
        x1 = tf.keras.Input(shape=(max_length, charset_length))
        vae_loss, z1 = self._buildEncoder(x1, latent_rep_size, max_length)
        self.autoencoder = tf.keras.Model(
            x1,
            self._buildDecoder(
                z1,
                latent_rep_size,
                max_length,
                charset_length
            )
        )

        #x2 = Input(shape=(max_length, charset_length)
        x2 = tf.keras.Input(shape=(max_length, charset_length))
        (z_m, z_l_v) = self._encoderMeanVar(x2, latent_rep_size, max_length)
        self.encoderMV = tf.keras.Model(inputs=x2, outputs=[z_m, z_l_v])

        if weights_file:
            self.autoencoder.load_weights(weights_file)
            self.encoder.load_weights(weights_file, by_name = True)
            self.decoder.load_weights(weights_file, by_name = True)
            self.encoderMV.load_weights(weights_file, by_name = True)

        self.autoencoder.compile(optimizer = 'Adam',
                                 loss = vae_loss,
                                 metrics = ['accuracy'])

    def _encoderMeanVar(self, x, latent_rep_size, max_length, epsilon_std = 0.01):
        '''
        h = Convolution1D(self.hypers['conv1'], self.hypers['conv1'], activation = 'relu', name='conv_1')(x)
        h = BatchNormalization(name='batch_1')(h)
        h = Convolution1D(self.hypers['conv2'], self.hypers['conv2'], activation = 'relu', name='conv_2')(h)
        h = BatchNormalization(name='batch_2')(h)
        h = Convolution1D(self.hypers['conv3'], self.hypers['conv3'], activation = 'relu', name='conv_3')(h) 
        h = BatchNormalization(name='batch_3')(h)
        h = Flatten(name='flatten_1')(h)
        h = Dense(self.hypers['dense'], activation = 'relu', name='dense_1')(h)

        z_mean = Dense(latent_rep_size, name='z_mean', activation = 'linear')(h)
        z_log_var = Dense(latent_rep_size, name='z_log_var', activation = 'linear')(h)
        '''

        h = tf.keras.layers.Conv1D(filters=self.hypers['conv1'], kernel_size=self.hypers['conv1'], strides=1, activation = 'relu', use_bias=True, padding='same')(x)
        h = tf.keras.layers.BatchNormalization(momentum=0.997, epsilon=1e-5, trainable=True)(h)
        h = tf.keras.layers.Conv1D(filters=self.hypers['conv2'], kernel_size=self.hypers['conv2'], strides=1, activation = 'relu', use_bias=True, padding='same')(h)
        h = tf.keras.layers.BatchNormalization(momentum=0.997, epsilon=1e-5, trainable=True)(h)
        h = tf.keras.layers.Conv1D( filters=self.hypers['conv3'], kernel_size=self.hypers['conv3'], strides=1, activation = 'relu', use_bias=True, padding='same')(h)
        h = tf.keras.layers.BatchNormalization(momentum=0.997, epsilon=1e-5, trainable=True)(h)
        h = tf.keras.layers.Flatten()(h)
        h = tf.keras.layers.Dense(units=self.hypers['dense'], activation = 'relu')(h)

        z_mean = tf.keras.layers.Dense(units=latent_rep_size, activation = 'linear')(h)
        z_log_var = tf.keras.layers.Dense(units=latent_rep_size, activation = 'linear')(h)

        return (z_mean, z_log_var) 

    def _buildEncoder(self, x, latent_rep_size, max_length, epsilon_std = 0.01):

        '''
        h = Convolution1D(self.hypers['conv1'], self.hypers['conv1'], activation = 'relu', name='conv_1')(x)
        h = BatchNormalization(name='batch_1')(h)
        h = Convolution1D(self.hypers['conv2'], self.hypers['conv2'], activation = 'relu', name='conv_2')(h)
        h = BatchNormalization(name='batch_2')(h)
        h = Convolution1D(self.hypers['conv3'], self.hypers['conv3'], activation = 'relu', name='conv_3')(h) 
        h = BatchNormalization(name='batch_3')(h)

        h = Flatten(name='flatten_1')(h)
        h = Dense(self.hypers['dense'], activation = 'relu', name='dense_1')(h)
        '''

        h = tf.keras.layers.Conv1D(filters=self.hypers['conv1'], kernel_size=self.hypers['conv1'], strides=1, activation = 'relu', use_bias=True, padding='same')(x)
        h = tf.keras.layers.BatchNormalization(momentum=0.997, epsilon=1e-5, trainable=True)(h)
        h = tf.keras.layers.Conv1D(filters=self.hypers['conv2'], kernel_size=self.hypers['conv2'], strides=1, activation = 'relu', use_bias=True, padding='same')(h)
        h = tf.keras.layers.BatchNormalization(momentum=0.997, epsilon=1e-5, trainable=True)(h)
        h = tf.keras.layers.Conv1D(filters=self.hypers['conv3'], kernel_size=self.hypers['conv3'], strides=1, activation = 'relu', use_bias=True, padding='same')(h)
        h = tf.keras.layers.BatchNormalization(momentum=0.997, epsilon=1e-5, trainable=True)(h)
        h = tf.keras.layers.Flatten()(h)
        h = tf.keras.layers.Dense(units=self.hypers['dense'], activation = 'relu')(h)

        def sampling(args):
            z_mean_, z_log_var_ = args
            '''
            batch_size = K.shape(z_mean_)[0]
            epsilon = K.random_normal(shape=(batch_size, latent_rep_size), mean=0., std = epsilon_std)
            '''

            batch_size = tf.shape(z_mean_)[0]
            epsilon = tf.random.normal(shape=(batch_size, latent_rep_size), mean=0., stddev = epsilon_std)

            #return z_mean_ + K.exp(z_log_var_ / 2) * epsilon
            return z_mean_ + tf.exp(z_log_var_ / 2) * epsilon

        '''
        z_mean = Dense(latent_rep_size, name='z_mean', activation = 'linear')(h)
        z_log_var = Dense(latent_rep_size, name='z_log_var', activation = 'linear')(h)
        '''
        z_mean = tf.keras.layers.Dense(units=latent_rep_size, activation = 'linear')(h)
        z_log_var = tf.keras.layers.Dense(units=latent_rep_size, activation = 'linear')(h)

        def conditional(x_true, x_pred):
            #most_likely = K.argmax(x_true)
            most_likely = tf.math.argmax(x_true)

            most_likely = tf.reshape(most_likely,[-1]) # flatten most_likely
            ix2 = tf.expand_dims(tf.gather(ind_of_ind_K, most_likely),1) # index ind_of_ind with res
            ix2 = tf.cast(ix2, tf.int32) # cast indices as ints 
            M2 = tf.gather_nd(masks_K, ix2) # get slices of masks_K with indices
            M3 = tf.reshape(M2, [-1,MAX_LEN,DIM]) # reshape them
            #P2 = tf.mul(K.exp(x_pred),M3) # apply them to the exp-predictions
            print("test3")
            print(M3)
            P2 = tf.math.multiply(tf.math.exp(x_pred),tf.cast(M3,tf.float32)) # apply them to the exp-predictions
            #P2 = tf.math.multiply(tf.cast(tf.math.exp(x_pred),tf.float64),M3) # apply them to the exp-predictions
            #P2 = tf.div(P2,K.sum(P2,axis=-1,keepdims=True)) # normalize predictions
            P2 = tf.math.divide(P2,tf.math.reduce_sum(P2,axis=-1,keepdims=True)) # normalize predictions
            return P2

        def vae_loss(x, x_decoded_mean):
            x_decoded_mean = conditional(x, x_decoded_mean)
            #x = K.flatten(x)
            #x_decoded_mean = K.flatten(x_decoded_mean)
            print("test1")
            print(tf.shape(x))
            x = tf.keras.layers.Flatten()(x)
            print("test2")
            print(tf.shape(x_decoded_mean))
            x_decoded_mean = tf.keras.layers.Flatten()(x_decoded_mean)

            #xent_loss = max_length * objectives.binary_crossentropy(x, x_decoded_mean)
            xent_loss = max_length * tf.keras.losses.binary_crossentropy(x, x_decoded_mean)
            #kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1)
            #kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis = -1)

            kl_loss = - 0.5 * tf.reduce_mean(1 + z_log_var - tf.math.square(z_mean) - tf.math.exp(z_log_var), axis = -1)
            kl_loss = - 0.5 * tf.reduce_mean(1 + z_log_var - tf.math.square(z_mean) - tf.math.exp(z_log_var), axis = -1)

            return xent_loss + kl_loss

        #return (vae_loss, Lambda(sampling, output_shape=(latent_rep_size,), name='lambda')([z_mean, z_log_var]))
        return (vae_loss, tf.keras.layers.Lambda(sampling, output_shape=(latent_rep_size,), name='lambda')([z_mean, z_log_var]))

    def _buildDecoder(self, z, latent_rep_size, max_length, charset_length):
        '''
        h = BatchNormalization(name='batch_4')(z)
        h = Dense(latent_rep_size, name='latent_input', activation = 'relu')(h)
        h = RepeatVector(max_length, name='repeat_vector')(h)
        h = GRU(self.hypers['hidden'], return_sequences = True, name='gru_1')(h)
        h = GRU(self.hypers['hidden'], return_sequences = True, name='gru_2')(h)
        h = GRU(self.hypers['hidden'], return_sequences = True, name='gru_3')(h)
        '''

        h = tf.keras.layers.BatchNormalization(momentum=0.997, epsilon=1e-5, trainable=True)(z)
        h = tf.keras.layers.Dense(units=latent_rep_size, activation = 'relu')(h)
        h = tf.keras.layers.RepeatVector(max_length)(h)
        gru1 = tf.keras.layers.GRU(units=self.hypers['hidden'], return_sequences = True)
        h = gru1(h)
        gru2 = tf.keras.layers.GRU(units=self.hypers['hidden'], return_sequences = True)
        h = gru2(h)
        gru3 = tf.keras.layers.GRU(units=self.hypers['hidden'], return_sequences = True)
        h = gru3(h)

        #return TimeDistributed(Dense(charset_length), name='decoded_mean')(h)
        return tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(units=charset_length))(h)

    def save(self, filename):
        self.autoencoder.save_weights(filename)

    def load(self, charset, weights_file, latent_rep_size = 10, max_length=MAX_LEN, hypers = {'hidden': 100, 'dense': 100, 'conv1': 2, 'conv2': 3, 'conv3': 4}):
        self.create(charset, max_length = max_length, weights_file = weights_file, latent_rep_size = latent_rep_size, hypers = hypers)
nzarnaghi commented 3 years ago

Excuse me, I have another question. Because the model is unsupervised, if I comment the line x_decoded_mean = conditional(x, x_decoded_mean), and run the code, does it affect on the performance? I understand that the loss and the accuracy will not be correct, but because the model is unsupervised, it must not affect on the generated model. Is it correct?

nzarnaghi commented 3 years ago

I could run the code by commenting x_decoded_mean = conditional(x, x_decoded_mean), but when the code was running, I tried to run another code with some changes in a separate Anaconda Prompt. Then I stopped the code. Now, when I try to run the similar code again without any changes, it gives me the following error:

AttributeError: 'str' object has no attribute 'decode'

The error refers to the tensorflow\python\keras package as follow:

hdf5_format.py. line 711, in load_weights_from_hdf5_group original_keras_version = f.attrs['keras_version'].decode('utf8')

This is some parts of my code which rises the error:

self.encoder.load_weights(weights_file, by_name = True)

I closed the program and reopened it, but the error still exits.

Do you know how to solve it?

nzarnaghi commented 3 years ago

The error AttributeError: 'str' object has no attribute 'decode' was solved by uninstalling h5py and installing h5py==2.10.0. Now the problem is as mentioned above. By commenting the line x_decoded_mean = conditional(x, x_decoded_mean), the code starts running, and by uncommenting this line, it shows this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'time_distributed_1_target' with dtype float and shape [?,?,?]
[[{{node time_distributed_1_target}}]]
nzarnaghi commented 3 years ago

I could finally solve the error. I checked the code again an realize that by uninstalling h5py and installing h5py==2.10.0, the code finally worked. I will send the upgraded code for Python 3.7 to you and put it my GitHub page as well. You can also put it on your GitHub. Thank you so much.

mkusner commented 3 years ago

Nice! Were you able to uncoment the line x_decoded_mean = conditional(x, x_decoded_mean) ? This line is very important because it masks out invalid grammar rules

nzarnaghi commented 3 years ago

Excuse me, that was my mistake. That line was previously commented. But the accuracy is about 0.82 and the val_loss is about 1.30 for train_eq with 50 epochs. Does this mean that the accuracy is about %82? Can we tolerate this accuracy? I will try to solve the bug, but if I could not, is this accuracy tolerable?

mkusner commented 3 years ago

I don't know if this is a reasonable accuracy for your setting but it sounds like it's definitely training! You could compare it with what you get from this repo: https://github.com/geyang/grammar_variational_autoencoder

nzarnaghi commented 3 years ago

Thank you so much. I realized that this accuracy is for train_eq.py and not for train_zinc.py. I changed the conditional function in model_eq.py as below:

        def conditional(x_true, x_pred):
            P2 = tf.math.exp(x_pred)
            P2 = tf.math.divide(P2,tf.math.reduce_sum(P2,axis=-1,keepdims=True)) # normalize predictions
            return P2

The accuracy for train_eq.py is about %98 now, but the accuracy for train_zinc.py is about %30. So, it seems that the conditional function as you have defined in model_eq.py and model_zinc.py is necessary for train_zinc.py, but not for train_eq.py. That is probably because of the type of the data. I checked the code in the link https://github.com/geyang/grammar_variational_autoencoder. In this code, the conditional function has been eliminated. That code is for train_eq as well. I am still struggling with it to see how to solve it for train_zinc.py as well. It seems that M3 and it's processing is eliminating the values for x_pred.