xy-chen16 / DeepSilencer

A deep convolutional neural network for the accurate prediction of silencers
Apache License 2.0
12 stars 6 forks source link

how to run DeepSilencer #1

Open BenxiaHu opened 3 years ago

BenxiaHu commented 3 years ago

Hello, I just read your paper about silencer database, and noticed that you implemented CNN to predict silencers. Would you like to explain how to run DeepSilencer? What file does DeepSilencer take as input? Thanks in advance. Best,

Rabailkamboh commented 2 years ago

Hello, I just read your paper about silencer database, and noticed that you implemented CNN to predict silencers. Would you like to explain how to run DeepSilencer? What file does DeepSilencer take as input? Thanks in advance. Best,

did you find out how it runs?

xy-chen16 commented 2 years ago

We are sorry for the confusing. DeepSilencer is constituted by a CNN model and an ANN model. The inputs of the CNN model are the one-hot encoded matrices of sequences and the inputs of the ANN model are the vectors of k-mer counts in sequences. We provided ‘train_mat.hkl’ and ‘test_mat.hkl’ as the demo input files in the ‘data’ folder, which contain 3200 DNA sequences and 800 DNA sequences, respectively. These sequences are of fixed length (200) and transformed into the one-hot matrices. For the model training and the prediction of DNA sequences with fixed length, you can run ‘run_self_projection.py’, and for the prediction of DNA sequences with variable length, you can see the details of our model settings in ‘run_crossdata_projection_human.py’ or ‘run_crossdata_projection_mouse.py’.

BenxiaHu commented 2 years ago

Hello, thanks for your kind explanation. I still have one question: ‘train_mat.hkl’ and ‘test_mat.hkl’ . here is what the train_mat.hkl looks like. Would you like to tell me how to make this matrix?

image

Best,

xy-chen16 commented 2 years ago

The step how to make the matrix of "train_mat.hkl" can be devided into two step: 1) we transform the 200bp sequence of ATCG to the matrix of (4,200) shape via the one-hot encoding, 2) and then squeeze the matrix to a factor with the length of 800.

BenxiaHu commented 2 years ago

hello, thanks for your explanation. Why do you squeeze the matrix to a factor with the length of 800?

xy-chen16 commented 2 years ago

Just for data storage. We then reshape the factor to (4,200,1) in the preprocessing step.