tariks / peakachu

Genome-wide contact analysis using sklearn
MIT License
57 stars 9 forks source link

peakachus score_chromosome question #2

Closed sabrsyed closed 4 years ago

sabrsyed commented 4 years ago

Hi, I created an environment that succesfully ran peakachu train on some publicly available Hi-C datasets. I am now trying to run peakachu score_chromosome but am encountering the following error:

line 245, in getnnz raise ValueError('row, column, and data array must all be the ' ValueError: row, column, and data array must all be the same length

I am running each chromosome individually, "peakachu score_chromosome -p HiC.cool --balance -O scores -m ~/peakachu/models/chr1.pkl"

Any help would be much appreciated!

XiaoTaoWang commented 4 years ago

I've never met such error before .. could you paste full traceback? Would be helpful for trouble shooting.

Xiaotao

sabrsyed commented 4 years ago

Here's the full output I got from my run

/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/sklearn/externals/joblib/https://urldefense.proofpoint.com/v2/url?u=http-3A__-5F-5Finit-5F-5F.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=fei7OWqoAEnLJBxWyK-fk-OKrPFagFZvMjzVPyg55Js&e=:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+. warnings.warn(msg, category=DeprecationWarning) /home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/peakachu/https://urldefense.proofpoint.com/v2/url?u=http-3A__scoreUtils.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=v0JDBzYtVMU74AiT38k3VEHoqGiHRo3NbJKDyZhHwK4&e=:62: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future. fts = np.vstack((i for i in fts)) scoring matrix chr2 num candidates 2607852 Traceback (most recent call last): File "/home/ss45w/miniconda3/envs/peakachu_env2/bin/peakachu", line 4, in import('pkg_resources').run_script('peakachu==1.1.3', 'peakachu') File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/pkg_resources/https://urldefense.proofpoint.com/v2/url?u=http-3A__-5F-5Finit-5F-5F.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=fei7OWqoAEnLJBxWyK-fk-OKrPFagFZvMjzVPyg55Js&e=", line 666, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/pkg_resources/https://urldefense.proofpoint.com/v2/url?u=http-3A__-5F-5Finit-5F-5F.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=fei7OWqoAEnLJBxWyK-fk-OKrPFagFZvMjzVPyg55Js&e=", line 1462, in run_script exec(code, namespace, namespace) File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/EGG-INFO/scripts/peakachu", line 76, in run() File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/EGG-INFO/scripts/peakachu", line 72, in run args.func(args) File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/peakachu/https://urldefense.proofpoint.com/v2/url?u=http-3A__score-5Fchromosome.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=nHnBrkgVkuK4So8Gw-aERJxOLVSeVIcVpixRmuRi1Us&e=", line 53, in main result,R = X.score() File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/peakachu-1.1.3-py3.6.egg/peakachu/https://urldefense.proofpoint.com/v2/url?u=http-3A__scoreUtils.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=v0JDBzYtVMU74AiT38k3VEHoqGiHRo3NbJKDyZhHwK4&e=", line 80, in score self.M = sparse.csr_matrix((data, (ri, ci)), shape=self.M.shape) File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__compressed.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=1PIV3YXsVE7Bp7Yb-RVp_4XM2SMDwxM-_XJM266-ZD0&e=", line 57, in init other = self.class(coo_matrix(arg1, shape=shape)) File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__coo.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=NDOx54JVX2F30nBNq6hJQEMG8PPjQmT84XTNVx2rUCw&e=", line 198, in init self._check() File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__coo.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=NDOx54JVX2F30nBNq6hJQEMG8PPjQmT84XTNVx2rUCw&e=", line 283, in _check if self.nnz > 0: File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__base.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=HFA7Jqy6Wq0wO0G4Xl7KX980eLBDpMi_RJPvw4-lef8&e=", line 250, in nnz return self.getnnz() File "/home/ss45w/miniconda3/envs/peakachu_env2/lib/python3.6/site-packages/scipy/sparse/https://urldefense.proofpoint.com/v2/url?u=http-3A__coo.py&d=DwIBAg&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=U7Y2ER9pkJjjJqhHrhwjXb10CwfNa1eVg9NSxY8DLXI&m=nbeaHprhGPeDjgRzW_3rTtdsQx89lG_G6ywN7DTpUng&s=NDOx54JVX2F30nBNq6hJQEMG8PPjQmT84XTNVx2rUCw&e=", line 245, in getnnz raise ValueError('row, column, and data array must all be the ' ValueError: row, column, and data array must all be the same length

tariks commented 4 years ago

A row/column mismatch can happen if you train with one window size (say w=4) and try to predict with another (default is 5). If that isn't the case here, then could you point us to the data files you are using? If the parameters are correct and the files work on my build, then the problem is in the installation somewhere. What kind of machine are you using and could you provide the command you used for training?

XiaoTaoWang commented 4 years ago

Hi, did you re-train the model yourself on Rao2014-GM12878-MboI-allreps-filtered.10kb.cool and perform the predictions on your cool files? It makes sense if this is the case because Rao2014-GM12878-MboI-allreps-filtered.10kb.cool was generated by an old version of cooler, in which the ICE-normalized values must have different range from your current cools.

I recommend using the pre-trained models we released for predictions. Or if you have your own positive training sets in GM12878, you can first re-run ICE and overwrite the 'weight' column with current cooler version before training: cooler balance -f Rao2014-GM12878-MboI-allreps-filtered.10kb.cool.

sabrsyed commented 4 years ago

Thanks for getting back to me. I'm trying to analyze a mouse HiC dataset (GSE95533). For the positive training set, I wasn't sure what to use so I used a bedpe file from CHiC data (same study) - is this appropriate?

I used distiller-nf to create .cool files for training, I ran this on our institution's High Performance Computing Cluster: "peakachu train -p D0-Mandrupn1n2__mm10.1000.cool --balance -O models -b Mandrup_CHiC_mm10_interactions.bedpe"

I can do the 'cooler balance -f' option you suggested on my .cool files and train again. Another question I had was should I be training on a .multires.cool file instead of the .cool file?

I'm just a beginner when it comes to Hi-C analysis so really appreciate the help

XiaoTaoWang commented 4 years ago

Hey, currently we only recommend 10Kb resolution matrix on which peakachu has been thoroughly tested.

Seems your input matrix was in 1kb (according to your file name)? Then the inconsistency could be the reason of your previous error because the default resolution parameter (-r) of Peakachu is 10000.

sabrsyed commented 4 years ago

I re-ran my training with a 10,000 kb input matrix and have now successfully run 'peakachu score_chromosome.' I still need to visualize it but really appreciate the help. Thanks!