peakachus score_chromosome question

sabrsyed commented 4 years ago

Hi, I created an environment that succesfully ran peakachu train on some publicly available Hi-C datasets. I am now trying to run peakachu score_chromosome but am encountering the following error:

line 245, in getnnz raise ValueError('row, column, and data array must all be the ' ValueError: row, column, and data array must all be the same length

I am running each chromosome individually, "peakachu score_chromosome -p HiC.cool --balance -O scores -m ~/peakachu/models/chr1.pkl"

Any help would be much appreciated!

XiaoTaoWang commented 4 years ago

I've never met such error before .. could you paste full traceback? Would be helpful for trouble shooting.

Xiaotao

sabrsyed commented 4 years ago

Here's the full output I got from my run

tariks commented 4 years ago

A row/column mismatch can happen if you train with one window size (say w=4) and try to predict with another (default is 5). If that isn't the case here, then could you point us to the data files you are using? If the parameters are correct and the files work on my build, then the problem is in the installation somewhere. What kind of machine are you using and could you provide the command you used for training?

XiaoTaoWang commented 4 years ago

Hi, did you re-train the model yourself on Rao2014-GM12878-MboI-allreps-filtered.10kb.cool and perform the predictions on your cool files? It makes sense if this is the case because Rao2014-GM12878-MboI-allreps-filtered.10kb.cool was generated by an old version of cooler, in which the ICE-normalized values must have different range from your current cools.

I recommend using the pre-trained models we released for predictions. Or if you have your own positive training sets in GM12878, you can first re-run ICE and overwrite the 'weight' column with current cooler version before training: cooler balance -f Rao2014-GM12878-MboI-allreps-filtered.10kb.cool.

sabrsyed commented 4 years ago

Thanks for getting back to me. I'm trying to analyze a mouse HiC dataset (GSE95533). For the positive training set, I wasn't sure what to use so I used a bedpe file from CHiC data (same study) - is this appropriate?

I used distiller-nf to create .cool files for training, I ran this on our institution's High Performance Computing Cluster: "peakachu train -p D0-Mandrupn1n2__mm10.1000.cool --balance -O models -b Mandrup_CHiC_mm10_interactions.bedpe"

I can do the 'cooler balance -f' option you suggested on my .cool files and train again. Another question I had was should I be training on a .multires.cool file instead of the .cool file?

I'm just a beginner when it comes to Hi-C analysis so really appreciate the help

XiaoTaoWang commented 4 years ago

Hey, currently we only recommend 10Kb resolution matrix on which peakachu has been thoroughly tested.

Seems your input matrix was in 1kb (according to your file name)? Then the inconsistency could be the reason of your previous error because the default resolution parameter (-r) of Peakachu is 10000.

sabrsyed commented 4 years ago

I re-ran my training with a 10,000 kb input matrix and have now successfully run 'peakachu score_chromosome.' I still need to visualize it but really appreciate the help. Thanks!

tariks / peakachu

peakachus score_chromosome question #2