nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
384 stars 182 forks source link

Error Plotting HiC Contacts #500

Open AlMuCan opened 2 years ago

AlMuCan commented 2 years ago

Hi,

I was hoping to get some help as while I have been able to process most of my samples (3 of 4) from a recent HiC run using HiC-Pro, for one sample I am receiving the following error detailed in the plot_hic_contacts.Rout logfile

Histogram of insert size

allvalidpairs <- list.files(path=hicDir, pattern=paste0("^[[:print:]]*\.validPairs$"), full.names=TRUE) stats_per_validpairs<- lapply(allvalidpairs, read.csv, sep="\t", as.is=TRUE, header=FALSE, row.names=1, nrow=100000) Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed Calls: lapply -> FUN -> read.table Execution halted

Looking through the first 100,000 rows of the validPairs file I do indeed only have 89,153 unique entries in the first column. I have not used any tools other than HiC-Pro thus far in the analysis stage. Could this be a samtools/alignment issue given that it appears to be assigning non-unique names? Is there any information I can provide that would be helpful here?

Thanks in advance, Alex

nservant commented 2 years ago

Hi Alex That's weird. Could you please show me your config file ? are you sure that you did not put twice the same data in input ? best

AAAAAbetter commented 1 year ago

Hello, I got the same error when analysis, this is my config file `# Please change the variable settings below if necessary

#########################################################################

Paths and Settings - Do not edit !

#########################################################################

TMP_DIR = tmp LOGS_DIR = logs BOWTIE2_OUTPUT_DIR = bowtie_results MAPC_OUTPUT = hic_results RAW_DIR =rawdata

#######################################################################

SYSTEM AND SCHEDULER - Start Editing Here !!

####################################################################### N_CPU = 8 SORT_RAM = 10000M LOGFILE = hicpro.log

JOB_NAME = JOB_MEM = JOB_WALLTIME = JOB_QUEUE = JOB_MAIL =

#########################################################################

Data

#########################################################################

PAIR1_EXT = _R1 PAIR2_EXT = _R2

#######################################################################

Alignment options

#######################################################################

MIN_MAPQ = 8

BOWTIE2_IDX_PATH = /home/data/yaoyanxin/reference/ref-for-ATAC/GRCh38_bowtie2 BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

#######################################################################

Annotation files

#######################################################################

REFERENCE_GENOME = GRCh38 GENOME_SIZE = /home/data/yaoyanxin/reference/GENCODE/GRCh38.p13/GRCh38.chrom.sizes

#######################################################################

Allele specific analysis

#######################################################################

ALLELE_SPECIFIC_SNP =

#######################################################################

Capture Hi-C analysis

#######################################################################

CAPTURE_TARGET = REPORT_CAPTURE_REPORTER = 1

#######################################################################

Digestion Hi-C

#######################################################################

GENOME_FRAGMENT = /home/data/yaoyanxin/hichip/analysis/digest/dpnii_hg38.bed LIGATION_SITE = AGATCGATCT MIN_FRAG_SIZE = 100 MAX_FRAG_SIZE =100000 MIN_INSERT_SIZE =100 MAX_INSERT_SIZE =1000

#######################################################################

Hi-C processing

#######################################################################

MIN_CIS_DIST = GET_ALL_INTERACTION_CLASSES = 1 GET_PROCESS_SAM = 0 RM_SINGLETON = 1 RM_MULTI = 1 RM_DUP = 1

#######################################################################

Contact Maps

#######################################################################

BIN_SIZE =1000 5000 10000 20000 150000 400000 500000 1000000 MATRIX_FORMAT = complete

#######################################################################

Normalization

####################################################################### MAX_ITER = 100 FILTER_LOW_COUNT_PERC = 0.02 FILTER_HIGH_COUNT_PERC = 0 EPS = 0.1 `