how to support DNA one-hot coding?

Sakurag1l commented 3 years ago

Hello, may I ask if we want to use your fastISM package? Does it support DNA one-hot coding as 4*n?

suragnair commented 3 years ago

Hi! Yes that's correct, you can send in an input of size batch_size x sequence_length x 4, which should typically be same as model input dimensions.

Sakurag1l commented 3 years ago

Thanks Surag. But my model input dimensions is batch_size x 4 x sequence_length, and my input is the same as the input dimension of my model.

tensorflow==2.4.0. Keras==2.4.3

`mutations = [[1,0,0,0], [0,1,0,0], [0,0,1,0], [0,0,0,1]]

fast_ism_model = fastism.FastISM(model, test_correctness=False) naive_ism_model = fastism.NaiveISM(model) fast_ism_out = [fast_ism_model(x, replace_with=mut) for mut in mutations] naive_ism_out = [naive_ism_model(x, replace_with=mut) for mut in mutations]`

InvalidArgumentError: Invalid reduction dimension (1 for input with 0 dimension(s) [Op:All]

suragnair commented 3 years ago

Would it be possible to share your model architecture? Or perhaps the first few layers after the input?

Sakurag1l commented 3 years ago

Thanks. This is my model architecture.

Model: "model"

Layer (type) Output Shape Param #

input_1 (InputLayer) [(None, 4, 3001)] 0

conv1_1 (Conv1D) (None, 4, 64) 768320

conv1_2 (Conv1D) (None, 4, 64) 16448

max_pooling1d (MaxPooling1D) (None, 1, 64) 0

dropout (Dropout) (None, 1, 64) 0

conv2_1 (Conv1D) (None, 1, 128) 32896

conv2_2 (Conv1D) (None, 1, 128) 65664

max_pooling1d_1 (MaxPooling1 (None, 1, 128) 0

dropout_1 (Dropout) (None, 1, 128) 0

conv3_1 (Conv1D) (None, 1, 128) 65664

conv3_2 (Conv1D) (None, 1, 128) 65664

max_pooling1d_2 (MaxPooling1 (None, 1, 128) 0

dropout_2 (Dropout) (None, 1, 128) 0

flatten (Flatten) (None, 128) 0

dense (Dense) (None, 128) 16512

activation (Activation) (None, 128) 0

dropout_3 (Dropout) (None, 128) 0

dense_1 (Dense) (None, 2) 258

activation_1 (Activation) (None, 2) 0

Total params: 1,031,426 Trainable params: 1,031,426 Non-trainable params: 0

suragnair commented 3 years ago

It looks like you are applying the convolution along the channels instead of the sequence. For example, your input is of size (None, 4, 3001), but after the first convolution it is (None, 4, 64) and after the second it is also (None, 4, 64). Typically the sequence width only reduces slightly after convolutions, e.g. from 3001->3001 - filter_width + 1.

To correct that it would help to make input of size (None, 3001, 4). You can see here that Conv1D expects channel (4) in the last dimension. Try it out and let me know!

Sakurag1l commented 3 years ago

Thanks ! I will re-train my model and try it again.

suragnair commented 3 years ago

You can see if it works even before you retrain your model. Good luck!

Sakurag1l commented 3 years ago

Sorry to bother you again. I would like to ask if the package does not support Concatenate layer now？ I want to use dilated convolution, so I add the Concatenate layer at the end. This is my model architecture.

Model: "model"

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(None, 3001, 4)] 0

conv1_1 (Conv1D) (None, 3001, 64) 1600 input_1[0][0]

conv1_2 (Conv1D) (None, 3001, 64) 24640 conv1_1[0][0]

max_pooling1d (MaxPooling1D) (None, 500, 64) 0 conv1_2[0][0]

dropout (Dropout) (None, 500, 64) 0 max_pooling1d[0][0]

conv2_1 (Conv1D) (None, 500, 128) 49280 dropout[0][0]

conv2_2 (Conv1D) (None, 500, 128) 98432 conv2_1[0][0]

max_pooling1d_1 (MaxPooling1D) (None, 83, 128) 0 conv2_2[0][0]

dropout_1 (Dropout) (None, 83, 128) 0 max_pooling1d_1[0][0]

conv3_1 (Conv1D) (None, 83, 128) 98432 dropout_1[0][0]

conv3_2 (Conv1D) (None, 83, 128) 98432 conv3_1[0][0]

max_pooling1d_2 (MaxPooling1D) (None, 13, 128) 0 conv3_2[0][0]

dropout_2 (Dropout) (None, 13, 128) 0 max_pooling1d_2[0][0]

conv4_1 (Conv1D) (None, 13, 64) 24640 dropout_2[0][0]

conv4_2 (Conv1D) (None, 13, 64) 24640 dropout_2[0][0]

conv4_3 (Conv1D) (None, 13, 64) 24640 dropout_2[0][0]

concatenate (Concatenate) (None, 13, 192) 0 conv4_1[0][0]
conv4_2[0][0]
conv4_3[0][0]

flatten (Flatten) (None, 2496) 0 concatenate[0][0]

dense (Dense) (None, 128) 319616 flatten[0][0]

activation (Activation) (None, 128) 0 dense[0][0]

dropout_3 (Dropout) (None, 128) 0 activation[0][0]

dense_1 (Dense) (None, 2) 258 dropout_3[0][0]

activation_1 (Activation) (None, 2) 0 dense_1[0][0]

suragnair commented 3 years ago

Concatenate is not supported in general for now, but for your model can you try

import fastism
fastism.fast_ism_utils.STOP_LAYERS.add("Concatenate")

and then try

fast_ism_model = fastism.FastISM(model, test_correctness=True)

If it returns without error then it's working as expected. Don't hesitate to reach out with more questions!

Sakurag1l commented 3 years ago

Thanks ! I will try.

kundajelab / fastISM

how to support DNA one-hot coding? #5