nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
386 stars 182 forks source link

error in global alignment step #381

Closed HangweiXi closed 4 years ago

HangweiXi commented 4 years ago

Hi

I am using HiC pro to deal with some dovetail hic data on a slrum cluster.

The error in the output is:


Wed Nov 18 21:59:31 ACDT 2020 Bowtie2 alignment step2 ... Logs: logs/sample1/mapping_step2.log Exit: Error in reads alignment - Exit make: *** [bowtie_local] Error 1


Then I check the log file and the error seems something wrong happen in the global mapping step.


DTG_HIC_1096_S0_L001_R2_vetch_draft_v0.1.bwt2glob.unmap_bowtie2.log:

HiC-Pro mapping

Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:20273:1450 2:N:0:NAGGTCT' because length (1) <= # seed mismatches (0) Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:20273:1450 2:N:0:NAGGTCT' because it was < 2 characters long Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:24424:1467 2:N:0:NAGGTCT' because length (1) <= # seed mismatches (0) Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:24424:1467 2:N:0:NAGGTCT' because it was < 2 characters long Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:23764:1520 2:N:0:NAGGTCT' because length (1) <= # seed mismatches (0) Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:23764:1520 2:N:0:NAGGTCT' because it was < 2 characters long Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:21592:1555 2:N:0:NAGGTCT' because length (1) <= # seed mismatches (0) Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:21592:1555 2:N:0:NAGGTCT' because it was < 2 characters long Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:3173:1784 2:N:0:NAGGTCT' because length (1) <= # seed mismatches (0) Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:3173:1784 2:N:0:NAGGTCT' because it was < 2 characters long Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:25296:1819 2:N:0:NAGGTCT' because length (1) <= # seed mismatches (0) Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:25296:1819 2:N:0:NAGGTCT' because it was < 2 characters long Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:12733:1924 2:N:0:AAGGTCT' because length (1) <= # seed mismatches (0) Warning: skipping read 'MG01HX07:637:H5J7FCCX2:7:1101:12733:1924 2:N:0:AAGGTCT' because it was < 2 characters long Error: Read MG01HX07:637:H5J7FCCX2:7:1101:31527:1924 2:N:0:AAGGTCT has more read characters than quality values. terminate called after throwing an instance of 'int' (ERR): bowtie2-align died with signal 6 (ABRT)


So I check the raw data with grep -a 3 commands and the raw-data seems normal, read and quality values are both 151.


@MG01HX07:637:H5J7FCCX2:7:1101:31527:1924 2:N:0:AAGGTCT CTCATCGAACATTAAATTAAAGTAAACGAGTTGTTCATTTAATTCTAAGAACAAAATTTTTTGTAATTTATTTAGTTATTTTATTGATTTTTGTTTTCTTTGAACTCTTTTAAATATACTTAAATTTGTAACTTTTTTTTTTTATATTTCA + <A<<---AA-----7AF-A<<J--A---<-<<--<7--<---<F--<<--A7-----<J----<7AJ-----<77-A<-<-7--7---7A-7--<7-77---7F7A77F77-7-A<7--7FF-F<<A--FF--7--77-7F-----7--77


However, I find something wrong with the output in the bowtie_results/bwt_global1/sample1/DTG_HIC_1096_S0_L001_R2_vetch_draft_v0.1.bwt2glob.unmap.fastq, this time I use grep a 7


@MG01HX07:637:H5J7FCCX2:7:1101:31527:1924 2:N:0:AAGGTCT CTCATCGAACATTAAATTAAAGTAAACGAGTTGTTCATTTAATTCTAAGAACAAAATTTTTTGTAATTTATTTAGTTATTTTATTGATTTTTGTTTTCTTTGAACTCTTTTAAATATACTTAAATTTGTAACTTTTTTTTTTTATATTTCA + <A<<---AA-----7AF-A<<J--A---<-<<--<7--<---<F--<<--A7JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJFFJJJJJFFFJJ-FFJJJJFJFAFJJFJJJJJ @MG01HX07:637:H5J7FCCX2:7:1107:18487:5739 2:N:0:AAGGTCT TTCTTGCATAGTTTGTGCTAACACTCCGATCGATCATGGAGCCACTCCAAATACAGAGGATTCAACTCATGTTCGTCGAAAAAGTCCCAAATGGACCACTGAACAAAATTTGGTCCTAATTAGTGGGTGGATTAAATATGGAACAGACAGT + AAFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJA


Seems not only that specific read lost 1 quality values (become 150) thus cause the mapping problem, but also the quality value were change and become abnormal. Not sure what's happened during the global mapping.

Config file:

Please change the variable settings below if necessary

#########################################################################

Paths and Settings - Do not edit !

#########################################################################

TMP_DIR = tmp LOGS_DIR = logs BOWTIE2_OUTPUT_DIR = bowtie_results MAPC_OUTPUT = hic_results RAW_DIR = rawdata

#######################################################################

SYSTEM AND SCHEDULER - Start Editing Here !!

####################################################################### N_CPU = 16 SORT_RAM = 24000M LOGFILE = hicpro.log

JOB_NAME = JOB_MEM = JOB_WALLTIME = JOB_QUEUE = JOB_MAIL =

#########################################################################

Data

#########################################################################

PAIR1_EXT = _R1 PAIR2_EXT = _R2

#######################################################################

Alignment options

#######################################################################

MIN_MAPQ = 10

BOWTIE2_IDX_PATH = /hpcfs/users/a1737558/vetch_polish_round2 BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder BOWTIE2_LOCAL_OPTIONS = --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

#######################################################################

Annotation files

#######################################################################

REFERENCE_GENOME = vetch_draft_v0.1 GENOME_SIZE = /hpcfs/users/a1737558/vetch_polish_round2/length.txt

#######################################################################

Allele specific analysis

#######################################################################

ALLELE_SPECIFIC_SNP =

#######################################################################

Capture Hi-C analysis

#######################################################################

CAPTURE_TARGET = REPORT_CAPTURE_REPORTER = 1

#######################################################################

Digestion Hi-C

#######################################################################

GENOME_FRAGMENT = /hpcfs/users/a1737558/vetch_polish_round2/vetch_digest.bed LIGATION_SITE = GATCGATC MIN_FRAG_SIZE = MAX_FRAG_SIZE = MIN_INSERT_SIZE = MAX_INSERT_SIZE =

#######################################################################

Hi-C processing

#######################################################################

MIN_CIS_DIST = GET_ALL_INTERACTION_CLASSES = 1 GET_PROCESS_SAM = 0 RM_SINGLETON = 1 RM_MULTI = 1 RM_DUP = 1

#######################################################################

Contact Maps

#######################################################################

BIN_SIZE = 20000 40000 150000 500000 1000000 MATRIX_FORMAT = upper

#######################################################################

Normalization

####################################################################### MAX_ITER = 100 FILTER_LOW_COUNT_PERC = 0.02 FILTER_HIGH_COUNT_PERC = 0 EPS = 0.1

Cheers Hangwei

HangweiXi commented 4 years ago

I rerun the pipeline and this error do not occur. Seems it is an accidental error.