yupenghe / REPTILE

Predicting regulatory DNA elements based on epigenomic signatures
MIT License
26 stars 4 forks source link

The full training data is missing #2

Open karamveerverma37 opened 2 months ago

karamveerverma37 commented 2 months ago

Hi, I would like to train a model for genome wide predictions and I found that the example dataset given is having a subset of training dataset (Chr19). Can you please share the full training dataset used for training.

yupenghe commented 2 months ago

Sure. See table S3 and S4 in the REPTILE paper https://www.pnas.org/doi/full/10.1073/pnas.1618353114

karamveerverma37 commented 2 months ago

Thanks for sharing the manuscript. As I can see this is raw data, can you share the preprocessed data used for training such as regions and their labels/state.

yupenghe commented 2 months ago

Will try but no promise since the data has been quite a few years. Also the training data will be on mm10 which probably won't help your case. Downloading raw data and reprocessing them on mm39 genome would be the best way.

karamveerverma37 commented 2 months ago

Hi Thanks for the suggestion. My aim is to use trained models or train model in REPTILE to infer enhancers. I am not much aware of all types of data preprocessing. I have scripts to liftover from mm10 to mm39. Please share the data if available, I can use the preprocessed data used in the manuscript.

yupenghe commented 2 months ago

Ok. I think I got the training data. Will organize it a little bit before sharing. Do you read perl script (which is what I used to run training commands)?

karamveerverma37 commented 2 months ago

Hi, thanks for the update. Yes I understand perl.

karamveerverma37 commented 2 months ago

Also I would like to know the pretrained models provided in models directory. Are these models trained on the full genome data or only chr19. because when I try to use them to compute score I get the values only for chr19 and 0 for others chromosomes. If these are trained only on chr19. Can you share the pretrained models for full genome if available. Thank you.

yupenghe commented 2 months ago

I try to use them to compute score I get the values only for chr19 and 0 for others chromosomes.

This is likely due to input files. Do you mind checking that the data of all chromosomes were used as input?

yupenghe commented 2 months ago

added the training and test data

https://github.com/yupenghe/REPTILE/tree/master/all_data

karamveerverma37 commented 2 months ago

Thanks for sharing the training data. Does It also need bigwig files of the full genome, or it can take the data from epimark file.

yupenghe commented 2 months ago

It starts from epimark files.