Error during pairing after alignment

divyanandu commented 4 years ago

Hi, I seem to be running into an error after the alignment step when the reads have to be paired. This is the error that I get

Run HiC-Pro 2.11.4
--------------------------------------------
Mon Jul 20 11:43:55 PDT 2020
Bowtie2 alignment step1 ...
Logs: logs/cHA/mapping_step1.log

--------------------------------------------
Mon Jul 20 13:18:15 PDT 2020
Bowtie2 alignment step2 ...
Logs: logs/cHA/mapping_step2.log

--------------------------------------------
Mon Jul 20 14:34:28 PDT 2020
Combine R1/R2 alignment files ...
Logs: logs/cHA/mapping_combine.log

--------------------------------------------
Mon Jul 20 15:10:00 PDT 2020
Mapping statistics for R1 and R2 tags ...
Logs: logs/cHA/mapping_stats.log

--------------------------------------------
Mon Jul 20 15:17:59 PDT 2020
Pairing of R1 and R2 tags ...
Logs: logs/cHA/mergeSAM.log
make: *** [bowtie_pairing] Error 1

When I look at the mergeSAM.log file, this is what I see

2.7/bin/python /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/mergeSAM.py -q 10 -t -v -f bowtie_results/bwt2/cHA/cHA_HiChIP_trimmed_R1_BAC16.bwt2merged.bam -r bowtie_results/bwt2/cHA/cHA_HiChIP_trimmed_R2_BAC16.bwt2merged.bam -o bowtie_results/bwt2/cHA/cHA_HiChIP_trimmed_BAC16.bwt2pairs.bam
/global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/hic.inc.sh: line 86: 2.7/bin/python: No such file or directory

But when I type 2.7/bin/python, i do open a python prompt.. so it does seem to exist.. The problem seem to be how it is reading the input bam files.. when I type in the full path to the input bam files like this

2.7/bin/python /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/mergeSAM.py -q 10 -t -v -f /clusterfs/vector/scratch/dnandaku/HiChIP/HiCProOP/bowtie_results/bwt2/cHA/cHA_HiChIP_trimmed_R1_BAC16.bwt2merged.bam -r /clusterfs/vector/scratch/dnandaku/HiChIP/HiCProOP/bowtie_results/bwt2/cHA/cHA_HiChIP_trimmed_R2_BAC16.bwt2merged.bam -o /clusterfs/vector/scratch/dnandaku/HiChIP/HiCProOP/bowtie_results/bwt2/cHA/cHA_HiChIP_trimmed_BAC16.bwt2pairs.bam

it seems to run fine.... Should I be in a specific folder when running HiCPro? I dont know what the next steps I need to run are!

nservant commented 4 years ago

Hi, The error is due to the fact that your python in "2.7/bin/python" is not found. Are you using HiC-Pro is cluster mode ? and if so, are you sure that python is available on all nodes ? Best

divyanandu commented 4 years ago

Hi Nick,

Thank you for getting back to me! I am running it on a cluster. I think python is available on the nodes because I am able to run a simple python script (like the one below) without error

#!/bin/sh -I

###################################

#SBATCH --job-name=python
#SBATCH --partition=vector
#SBATCH --account=vector_glaunsinger
#SBATCH --qos=vector_batch
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --time=5:00:00
#SBATCH --output=pytest_%j.out
#SBATCH --error=pytest_%j.err
#SBATCH --mail-user=divya.nandakumar@berkeley.edu
#SBATCH --mail-type=ALL

##################################

cd $SLURM_SUBMIT_DIR

source ~/.conda/envs/python=2.7/bin/activate 

2.7/bin/python /clusterfs/vector/scratch/dnandaku/pythonTest.py > ~/test_072120_new.log

source deactivate

I did notice that config-system.txt file says

CLUSTER_SCRIPT = /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/make_torque_script.sh while I use SLURM for the cluster.. would that affect this step in anyway? I figured it might not given that the alignment step is probably what uses the parallel nodes?

Given the alignment is done, would I be able to run the subsequent steps independently? When I tried to run steps beyond mapping there seems to be some problem with the input folder.. I tried to use this to guide me

HiC-Pro -i ${RES_PREFIX}_3/bowtie_results/bwt2 -o ${RES_PREFIX}_3.1 -c config_test.txt -s proc_hic -s quality_checks

but it appears to look for the input bam files in a 'rawdata' folder that I am unable to override.

divyanandu commented 4 years ago

This is the content of the system-config.txt file. It does list the right path to python

R_PATH = /global/software/sl-7.x86_64/modules/langs/r/3.4.2/bin
BOWTIE2_PATH = /global/home/groups/consultsw/sl-7.x86_64/modules/bowtie2/2.3.4.1
SAMTOOLS_PATH = /global/home/groups/consultsw/sl-7.x86_64/modules/samtools/1.8/bin
PYTHON_PATH = /global/home/users/dnandaku/.conda/envs/python=2.7/bin
INSTALL_PATH = /global/home/users/dnandaku/HiC-Pro_2.11.4
SCRIPTS = /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts
SOURCES = /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/src
ANNOT_DIR = /global/home/users/dnandaku/HiC-Pro_2.11.4/annotation
CLUSTER_SCRIPT = /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/make_torque_script.sh

nservant commented 4 years ago

Hi, Two points ;

First, I'm surprised that the python script is called as ;

2.7/bin/python /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/mergeSAM.py

because your PYTHON_PATH in the HiC-Pro installation is

PYTHON_PATH = /global/home/users/dnandaku/.conda/envs/python=2.7/bin

So in theory, it should call ;

/global/home/users/dnandaku/.conda/envs/python=2.7/bin/python  /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/mergeSAM.py

I'm wondering if the "=" that you put on your path could not explain that ... Could you try to change that ? with a symlink for instance ?

Then, indeed, if you are using SLURM you should have

CLUSTER_SCRIPT = /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/make_slurm_script.sh

It means that the first time, you didn't run HiC-Pro in parallel mode, otherwise, you would have an error at the first step as the jobs would not be submitted to your cluster. Best N

nservant commented 4 years ago

Hi, Two points ;

First, I'm surprised that the python script is called as ;

2.7/bin/python /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/mergeSAM.py

because your PYTHON_PATH in the HiC-Pro installation is

PYTHON_PATH = /global/home/users/dnandaku/.conda/envs/python=2.7/bin

So in theory, it should call ;

/global/home/users/dnandaku/.conda/envs/python=2.7/bin/python  /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/mergeSAM.py

I'm wondering if the "=" that you put on your path could not explain that ... Could you try to change that ? with a symlink for instance ?

Then, indeed, if you are using SLURM you should have
```
CLUSTER_SCRIPT = /global/home/users/dnandaku/HiC-Pro_2.11.4/scripts/make_slurm_script.sh
```
It means that the first time, you didn't run HiC-Pro in parallel mode, otherwise, you would have an error at the first step as the jobs would not be submitted to your cluster. Best N

divyanandu commented 4 years ago

I wondered the same thing about 2/7/bin/python and had created a sim link called 2.7/bin/python and directed it to the correct path. It still throws the error. It looks like there is no easy way to rename a conda environment.. I could clone the existing one into a new environment and remove the old one with the = sign and see if that helps.

Thanks for the heads up about the slurm script. It is true I didn't specify parallel mode when running the first time. I am trying to use HiC-Pro on a small genome and thought it should run fast enough even without it. Will make the change!

divyanandu commented 4 years ago

Just wanted to give you an update. The problem was solved once I cloned a new environment without the = sign and ran the script. Thanks for your help!

nservant commented 4 years ago

Great ! I close the issue then

nservant / HiC-Pro

Error during pairing after alignment #347