nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

Different resolution from validpairs #322

Closed Rayko87 closed 4 years ago

Rayko87 commented 4 years ago

Hello Servant,

More than an issue, it is a question about HiCPro. Since getting the valid pairs is the most time-consuming process, is there a way to get the contact matrix with different kb resolution from the validpairs file?

Thanks, Robert

nservant commented 4 years ago

Hi Robbert, Yes, two options, either you use the -s options starting from the allValidPairs file. Or you can simple use the build_matrix tools in scripts/ best

Rayko87 commented 4 years ago

Thanks for your quick answer.

I am trying to do so, but I got an error:

/mnt/data/robert/ANALISIS/Virtual_4C_DND41$ HiC-Pro -i /mnt/data/robert/ANALISIS/HiC-Pro-DND41/Second_analysis_DND41_HiCpro/hic_results/data/sample1/sample1.allValidPairs -o DND41_5KB_chr8 -c config-hicpro.txt -s build_contact_maps

Exit: Error: Directory Hierarchy of rawdata '/mnt/data/robert/ANALISIS/HiC-Pro-DND41/Second_analysis_DND41_HiCpro/hic_results/data/sample1/sample1.allValidPairs' is not correct. No '.allValidPairs' files detected

This is my configuration file. with the -s option, should the data be included differently?

Please change the variable settings below if necessary

#########################################################################

Paths and Settings - Do not edit !

#########################################################################

TMP_DIR = tmp LOGS_DIR = log BOWTIE2_OUTPUT_DIR = MAPC_OUTPUT = RAW_DIR =

#######################################################################

SYSTEM AND SCHEDULER - Start Editing Here !!

####################################################################### N_CPU = 11 LOGFILE = hicpro.log

JOB_NAME = JOB_MEM = JOB_WALLTIME = JOB_QUEUE = JOB_MAIL = #########################################################################

Data

#########################################################################

PAIR1_EXT = _R1_001 PAIR2_EXT = _R2_001 #######################################################################

REFERENCE_GENOME = GRChg37-hg19 GENOME_SIZE = chrom_hg19.sizes

#######################################################################

Allele specific analysis

#######################################################################

ALLELE_SPECIFIC_SNP =

#######################################################################

Capture Hi-C analysis

#######################################################################

CAPTURE_TARGET = REPORT_CAPTURE_REPORTER = 1

#######################################################################

Digestion Hi-C

#######################################################################

GENOME_FRAGMENT = /mnt/data/robert/ANALISIS/HiC-Pro-DND41/GATC_hg19 LIGATION_SITE = GATCGATC MIN_FRAG_SIZE = MAX_FRAG_SIZE = MIN_INSERT_SIZE = MAX_INSERT_SIZE =

#######################################################################

Hi-C processing

#######################################################################

MIN_CIS_DIST = GET_ALL_INTERACTION_CLASSES = 1 GET_PROCESS_SAM = 0 RM_SINGLETON = 1 RM_MULTI = 1 RM_DUP = 1

#######################################################################

Contact Maps

#######################################################################

BIN_SIZE = 5000 MATRIX_FORMAT = upper

#######################################################################

Normalization

####################################################################### MAX_ITER = 100 FILTER_LOW_COUNT_PERC = 0.02 FILTER_HIGH_COUNT_PERC = 0 EPS = 0.1

Thanks! #######################################################################

Alignment options

#######################################################################

MIN_MAPQ = 10

BOWTIE2_IDX_PATH =/mnt/data/robert/index_and_genome_files BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

#######################################################################

Annotation files

#######################################################################

nservant commented 4 years ago

Even in stepwise mode, HiC-Pro expects to have a folder in input, with on subfolder per sample. So in your case -i /mnt/data/robert/ANALISIS/HiC-Pro-DND41/Second_analysis_DND41_HiCpro/hic_results/data/ Best

Rayko87 commented 4 years ago

Hello Sirvent,

Thanks again for your answers. Changing this generates a different error. Now I run as you suggestes:

HiC-Pro -i /mnt/data/robert/ANALISIS/HiC-Pro-DND41/Second_analysis_DND41_HiCpro/hic_results/data/ -o DND41_5KB_chr8 -c config-hicpro.txt -s build_contact_maps

Run HiC-Pro 2.11.1

Tue Mar 24 20:07:52 EDT 2020 Generate binned matrix files ... Exit: Error in input type.'.fastq|.bam|.validPairs|.allValidPairs|.matrix' files are expected. /opt/HiC-Pro/bin/../scripts//Makefile:171: recipe for target 'build_raw_maps' failed make: *** [build_raw_maps] Error 1

However, inside this difrectory /mnt/data/robert/ANALISIS/HiC-Pro-DND41/Second_analysis_DND41_HiCpro/hic_results/data/ is where I have the folder called "sample1" that contains all these files:

-rw-rw-r-- 1 robert robert 270M Mar 24 19:53 HiChip27-DND41_GRChg37-hg19.bwt2pairs.DEPairs -rw-rw-r-- 1 robert robert 405K Mar 24 19:53 HiChip27-DND41_GRChg37-hg19.bwt2pairs.DumpPairs -rw-rw-r-- 1 robert robert 0 Mar 24 19:53 HiChip27-DND41_GRChg37-hg19.bwt2pairs.FiltPairs -rw-rw-r-- 1 robert robert 129M Mar 24 19:53 HiChip27-DND41_GRChg37-hg19.bwt2pairs.REPairs -rw-rw-r-- 1 robert robert 328 Mar 24 19:53 HiChip27-DND41_GRChg37-hg19.bwt2pairs.RSstat -rw-rw-r-- 1 robert robert 239M Mar 24 19:53 HiChip27-DND41_GRChg37-hg19.bwt2pairs.SCPairs -rw-rw-r-- 1 robert robert 0 Mar 24 19:53 HiChip27-DND41_GRChg37-hg19.bwt2pairs.SinglePairs -rw-rw-r-- 1 robert robert 23G Mar 24 19:53 HiChip27-DND41_GRChg37-hg19.bwt2pairs.validPairs -rw-rw-r-- 1 robert robert 20G Mar 24 19:55 sample1.allValidPairs

Thanks again!

Robert

nservant commented 4 years ago

The first lines of the config file should not be edited !!

# Please change the variable settings below if necessary

#########################################################################
## Paths and Settings  - Do not edit !
#########################################################################

TMP_DIR = tmp
LOGS_DIR = logs
BOWTIE2_OUTPUT_DIR = bowtie_results
MAPC_OUTPUT = hic_results
RAW_DIR = rawdata
Rayko87 commented 4 years ago

Hello,

Thanks, I changed it but is still does not work. Now running this: HiC-Pro -i /mnt/data/robert/ANALISIS/HiC-Pro-DND41/Second_analysis_DND41_HiCpro/hic_results/data/ -o DND41_5Kb_chr8 -c config-hicpro.txt -s build_contact_maps

Run HiC-Pro 2.11.1 mkdir: missing operand Try 'mkdir --help' for more information. /opt/HiC-Pro/bin/../scripts//Makefile:75: recipe for target 'configure' failed make: *** [configure] Error 1

The program stops, creating a folder named DND41_5Kb_chr8 with a copy of the configure file and the data folder, with the sample1 subfolder containing all the HiCpro processed files (.allvalidpairs and so on). It doesn't matter where I run this, it stops generating these folders again.

nservant commented 4 years ago

This is again linked the RAW_DIR. In theroy, it should do a mkdir $RAW_DIR and if the variable is not set, it crashes. Are you sure that your config is corrected ?

Rayko87 commented 4 years ago

Hello again,

Sorry to bother you. You were right:my config file didn't have the correct rawfile line. However, running it now generates a new error

Run HiC-Pro 2.11.1

Wed Mar 25 14:47:02 EDT 2020 Generate binned matrix files ... Logs: logs/sample1/build_raw_maps.log sed: -e expression #1, char 8: unknown option to `s'

When I go to the log, it is empty with only one line saying:

Generate contact maps at 5000 resolution ...

nservant commented 4 years ago

I think the error comes from the scripts build_raw_maps.sh in the scripts folder. The point is that I do not understand what's going wrong. The error comes from the line 103. Could you try to edit the script, adding a few trace at line 102 ;

echo ${r}

N

Rayko87 commented 4 years ago

Hello Servant,

I change the script putting echo in that part...but the error is still happening and I dont see the r parameter: Run HiC-Pro 2.11.1

Thu Mar 26 14:16:03 EDT 2020 Generate binned matrix files ... Logs: logs/sample1/build_raw_maps.log sed: -e expression #1, char 8: unknown option to s' sed: -e expression #1, char 8: unknown option tos'

In fact, when I change build_raw_matrix script in the script folder:

Logs

ldir=${LOGS_DIR}/${RES_FILE_NAME}
mkdir -p ${ldir}
echo "Logs: ${ldir}/build_raw_maps.test.log"
echo "${BIN_SIZE} hELLO"
if [ -d ${DATA_DIR}/${RES_FILE_NAME} ]; then
    MATRIX_DIR=${MAPC_OUTPUT}/matrix/${RES_FILE_NAME}/raw
    for bsize in ${BIN_SIZE}
    do

Changing the name ot the Logs (Adding a test particle), the result is still the same:

Run HiC-Pro 2.11.1

Thu Mar 26 14:19:18 EDT 2020 Generate binned matrix files ... Logs: logs/sample1/build_raw_maps.log sed: -e expression #1, char 8: unknown option to s' sed: -e expression #1, char 8: unknown option tos'

Without the "test" word or anything like that....

Rayko87 commented 4 years ago

Moreover, there is a file that it is not a .sh file in the scripts file called build_matrix. Is it OK this file?

nservant commented 4 years ago

It's a bit difficult to help you like this. Last option, you can directly use the build_matrix tools (without .sh) This is the tool that generate the maps.

./build_matrix 

./build_matrix: missing --binsize or --binfile option

usage: ./build_matrix --binsize BINSIZE|--binfile --chrsizes FILE --ifile FILE
       --oprefix PREFIX [--binadjust] [--step STEP] [--binoffset OFFSET]
       [--matrix-format asis|upper|lower|complete][--chrA CHR... --chrB CHR...] [--quiet] [--progress] [--detail-progress]

The input file is your allValidPairs file. Specify the BIN_SIZE, the CHROMOSOME file (with the size of the chromosome), --matrix-format upper and here you go !

Rayko87 commented 4 years ago

Thanks Servant!

I don't know what is happening with the first script, but your solution worked perfectly. With the build_matrix and the ice scripts I got the results I needed!

Drosophilid commented 2 years ago

Hi @nservant, I try to use the /apps/hicpro/2.10.0/scripts/build_matrix command to get different resolution. But, unfortunately I'm getting this error: terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_S_construct null not valid Aborted (core dumped)

command:#/apps/hicpro/2.10.0/scripts/build_matrix --binsize 5000 --chrsizes chr.sizes --ifile sample1_allValidPairs.gz --matrix-format upper --oprefix sample1

Could you please help me to sort out this issue. Thanks