ma-compbio / SNIPER

nuclear compartments, subcompartments, nuclear organization, Hi-C, autoencoder
MIT License
29 stars 11 forks source link

SNIPER is just for human?Any suggestions? If we want to use SNIPER in other species. #3

Open wbszhu opened 4 years ago

wbszhu commented 4 years ago

Hi,

  1. We wanted to use it for the prediction of subcompartment A1 and A2 in mouse, but when I read the paper, it was all about human predictions. Any suggestions?If we want to use SNIPER in other species.
  2. I have tried to modify the script to run, but I still get an error. And I don't know how to prepare the files needed by sniper_train.py
  3. And I really want to know what is the crop? Thank you, Ruby
kairukuma commented 4 years ago

Hi Ruby,

  1. Currently, we only provide models trained on GM12878 (human lymphoblastoid cells) and do not have models trained on mouse cell types. To make predictions in mouse cell types, we suggest training a separate model on high coverage Hi-C mouse cell types (i.e. mESC, mNPC, or mCN from Bonev et al.) and use a downsampled dataset for your training input. As we don't have a ground truth for mouse cell annotations, you would need to run Gaussian HMM (or another clustering method) on high-coverage mouse data to obtain a ground truth.

  2. I'm not sure which errors you are getting - it would be helpful if you could be more specific. If you are running SNIPER on mouse cell types, you need files that are tailored to the mouse genome. Looking at SNIPER's code, currently it's designed to work on human cell types because it assumes 22 autosomal chromosomes, but can be modified to work on mouse cell types by adjusting the upper range of the for loops from 23 to 20. We will work on adding an option to specify different genome assemblies across multiple species.

  3. The crop here includes matrix rows and columns that were removed from the GM12878 inter-chromosomal Hi-C matrix as they were too sparse (greater than 30% of entries were zeros or undefined). The crop is important in running SNIPER because it ensures that the same rows and columns are removed from the Hi-C matrices of other cell types. You can construct such a crop map when you're annotating your ground truth.

wbszhu commented 4 years ago

Hi elykcoldster, Sorry too late to see it ~ Thanks for the reply~ For the Third point, I had change the upper range of the for loops. This operation make no sense, and errors will still occur. ########################################################################### [lzhang@cu06 SNIPER]$python3.6 sniper_train.py 0h.allvalidPairs.hic ./target_hic annotations.zip -jt $PATH/juicer_tools_1.11.09_jcuda.0.8.jar -c crop_map -dd ./test -sm -ar -ow Using TensorFlow backend. Constructing input matrix Trimming sparse regions... Traceback (most recent call last): File "sniper_train.py", line 21, in train_with_hic(params) File "$PATH/training.py", line 67, in train_with_hic inputM = trimMat(inputM,params['cropIndices']) File "$PATH/data_processing.py", line 77, in trimMat M = M[row_indices,:] IndexError: index 12808 is out of bounds for axis 0 with size 12808 ###########################################################################

Looking forward to your updates. best

Simple53 commented 1 year ago

Hi, I am also interested in training with my own data with Gaussian HMM. Could you give some suggestions?