nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
386 stars 182 forks source link

Installation in conda environment, bx.python error, PyUnicodeUCS2_FromStringAndSize #361

Closed mdozmorov closed 4 years ago

mdozmorov commented 4 years ago

I am trying installing HiC-Pro on our cluster, without root access. The only way to work with Python 2.7 is to use conda environment.

conda create -n HiC-Pro python=2.7
source activate HiC-Pro
pip install bx-python
python
import bx.intervals

Importing bx.intervals fails with error

ImportError: .../lib/python2.7/site-packages/bx/intervals/intersection.so: undefined symbol: PyUnicodeUCS2_FromStringAndSize 

Googling gives a google group post, SO post. These advices are of no help, I don't know how to recompile Python within the environment, or whether it is even possible.

I tried the devel branch, it installs into the provided Conda environment after some tweaks (#359 ). But, perhaps because it is a developmental version, it doesn't produce results, .allValidPairs is empty. It may be the data or settings issue, but chances are stim.

How to overcome the bx-python installation error in the conda environment?

nservant commented 4 years ago

Hi,

I have a py3 version of HiC-Pro which is almost ready to be released. See https://github.com/nservant/HiC-Pro/tree/devel In addition, I'm now providing a yaml file for conda installation which might fix your issue. Best

mdozmorov commented 4 years ago

The devel version can be installed, I'm just unable to get any .allValidPairs. All 43Gb of data goes into .DumpPairs. That's why I wanted to test the 2.11 version.

The 3.0.0 version gives the following error at the second step

Run quality checks for all samples ...
Logs: logs/Sample/make_Rplots.log
make: *** [/home/mdozmorov/.local/HiC-Pro_3.0.0/scripts/Makefile:181: hic_qc] Error 1

But nothing alarming is in the log, and all output seems there. Thanks, @nservant , I'll continue playing with v3.0.0, close this issue, and if you have any suggestions on the above error and empty data, please, comment

nservant commented 4 years ago

The qc fails means that a R script crashed somehow when it generates the QC plots. In the log folder, you should. have some Rout files. Can you check if you have an error message

Usually, it is very unlikely that all pairs goes into the DumpPairs classes. It means that it is not able to reconstruct the ligation product .... One typical error can come from the annotation files. For instance, are you using the same chromosome name between the bowtie indexes, and the list of restriction fragments ?

mdozmorov commented 4 years ago

Well, I did double-checked chromosomes, everything seems fine. Moreover, one sample has successfully completed both steps, the data is there. But with others, I'm having the same Makefile:181: hic_qc] Error 1. I've been rerunning the jobs, but the issue persists. I understand the QC plotting fails because the .allValidPairs is zero, the subsequent data is also zero. I looked through all log files, but they look OK, just information, no errors. The alignment step seems OK:

(HiCExplorer) -bash-4.1$ ls -lah bowtie_results/bwt2/Sample/
-rw-r--r-- 1 mdozmorov hic  46G Sep 11 21:38 Sample_hg38.bwt2pairs.bam
-rw-r--r-- 1 mdozmorov hic  326 Sep 11 21:38 Sample_hg38.bwt2pairs.pairstat
-rw-r--r-- 1 mdozmorov hic  43G Sep 11 17:31 Sample_R1_hg38.bwt2merged.bam
-rw-r--r-- 1 mdozmorov hic  145 Sep 11 17:59 Sample_R1_hg38.mapstat
-rw-r--r-- 1 mdozmorov hic  43G Sep 11 17:30 Sample_R2_hg38.bwt2merged.bam
-rw-r--r-- 1 mdozmorov hic  145 Sep 11 18:00 Sample_R2_hg38.mapstat

But something is failing and produces:

(HiCExplorer) -bash-4.1$ ls -lah hic_results/data/Sample/
-rw-r--r-- 1 mdozmorov hic    0 Sep 12  2020 Sample.allValidPairs
-rw-r--r-- 1 mdozmorov hic    0 Sep 11 21:38 Sample_hg38.bwt2pairs.DEPairs
-rw-r--r-- 1 mdozmorov hic  43G Sep 12 08:37 Sample_hg38.bwt2pairs.DumpPairs
-rw-r--r-- 1 mdozmorov hic 5.6G Sep 12 08:37 Sample_hg38.bwt2pairs.FiltPairs
-rw-r--r-- 1 mdozmorov hic  46G Sep 12 08:38 Sample_hg38.bwt2pairs_interaction.bam
-rw-r--r-- 1 mdozmorov hic    0 Sep 11 21:38 Sample_hg38.bwt2pairs.REPairs
-rw-r--r-- 1 mdozmorov hic  286 Sep 12 08:37 Sample_hg38.bwt2pairs.RSstat
-rw-r--r-- 1 mdozmorov hic    0 Sep 11 21:38 Sample_hg38.bwt2pairs.SCPairs
-rw-r--r-- 1 mdozmorov hic    0 Sep 11 21:38 Sample_hg38.bwt2pairs.SinglePairs
-rw-r--r-- 1 mdozmorov hic    0 Sep 12 08:38 Sample_hg38.bwt2pairs.validPairs

Any suggestions?

nservant commented 4 years ago

That really strange, and difficult to understand without looking at the data. Would you mind sharing with me (by email) the first 1000 reads for instance ? as well as your config file ? N

zhanwen-cheng commented 4 years ago

Hi,

I have a py3 version of HiC-Pro which is almost ready to be released. See https://github.com/nservant/HiC-Pro/tree/devel In addition, I'm now providing a yaml file for conda installation which might fix your issue. Best

Hi, I have built the conda environment with your newly updated envrionment.yml file, it was set up with python3. But i could found the corresponding python3 HiC-Pro to download. Could you tell me how do download that? Thanks~

nservant commented 4 years ago

Hi, It is on the devel branch so far. I'll try to release it as soon as possible N

zhanwen-cheng commented 4 years ago

Hi, It is on the devel branch so far. I'll try to release it as soon as possible N

OK~looking forward to that~