kundajelab / bpnet

Toolkit to train base-resolution deep neural networks on functional genomics data and to interpret them
http://bit.ly/bpnet-colab
MIT License
141 stars 33 forks source link

Double-checking: full training data included in chip-nexus example? #15

Open an1lam opened 4 years ago

an1lam commented 4 years ago

Hi,

I was hoping to run an experiment in which I want to train the bpnet model on the full set of chip-nexus data. I know in the example notebook it limits to a subset of the chromosomes used, so I just want to verify that if I remove this line:

exclude_chr=["chrX","chrY","chr5","chr6","chr7","chr10","chr14","chr11","chr13","chr12","chr15"]

from the config, I'll be using the full dataset.

Related to this, I want to try something related to the sequence region on which the CRISPR experiment was done. This means I intend to use chromosome 10 as my hold-out chromosome. Just wanted to double-check that this should be fine.

Avsecz commented 4 years ago

Hey,

To train on all chromosomes, you also have to set train_chr=[] and test_chr=[]. See the bpnet_data function, bpnet9.gin file and this cell in the notebook. Make sure to also use full width seq_width and the original (9) dilated layer n_dil_layers. Note that the colab example uses 3/4 TFs and I think I have also removed the data from all other chromosomes to speedup downloading in the colab. Therefore, I suggest to download the original data.

Avsecz commented 4 years ago

(accidentally closed the issue before). Regarding the crispr locus, it depends on what you want to do. Because the model was never trained on the actual crispr experiment/perturbation, it doesn't matter if chr10 is already in the training data. In case you want to evaluate the prediction from the reference sequence (i.e. without the crispr perturbation), then yes, holding out chr10 sounds good.

Avsecz commented 4 years ago

To get the original training data, follow the readme in https://github.com/kundajelab/bpnet-manuscript.

an1lam commented 4 years ago

Thanks this is very helpful!