nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

make: *** [bowtie_local] Error 1 #399

Closed YiweiNiu closed 3 years ago

YiweiNiu commented 3 years ago

Hi,

I am using HiC-Pro 3.0.0. And one of the samples encountered mapping error. I guess it was the due to corrupted Fastqs, which were generated by HiC-Pro, not the input Fastqs.

The following are the logs.

# HiCpro log
$ cat HiCpro_s1_hicpro_210117.e326226-3 
--------------------------------------------
Mon Jan 18 13:47:58 CST 2021
Bowtie2 alignment step1 ...
Logs: logs/M0_2_HIC_201222/mapping_step1.log

--------------------------------------------
Mon Jan 18 15:45:00 CST 2021
Bowtie2 alignment step2 ...
Logs: logs/M0_2_HIC_201222/mapping_step2.log
[W::sam_read1] Parse error at line 1259221
[main_samview] truncated file.
Exit: Error in reads alignment - Exit
make: *** [bowtie_local] Error 1

# bowtie log
$ tail M0_2_HIC_201222_R2_mm10.bwt2glob.unmap_bowtie2.log 
Warning: skipping read 'E00513:215:HFFTTCCX2:1:1123:18487:20788 2:N:0:TGACCA+AGATCTCG' because length (0) <= # seed mismatches (0)
Warning: skipping read 'E00513:215:HFFTTCCX2:1:1123:18487:20788 2:N:0:TGACCA+AGATCTCG' because it was < 2 characters long
Warning: skipping read 'E00513:215:HFFTTCCX2:1:1123:29934:20823 2:N:0:TGACCA+AGATCTCG' because length (1) <= # seed mismatches (0)
Warning: skipping read 'E00513:215:HFFTTCCX2:1:1123:29934:20823 2:N:0:TGACCA+AGATCTCG' because it was < 2 characters long
Warning: skipping read 'E00513:215:HFFTTCCX2:1:1123:28087:21034 2:N:0:TGACCA+AGATCTCG' because length (1) <= # seed mismatches (0)
Warning: skipping read 'E00513:215:HFFTTCCX2:1:1123:28087:21034 2:N:0:TGACCA+AGATCTCG' because it was < 2 characters long
Saw ASCII character 0 but expected 33-based Phred qual.
terminate called after throwing an instance of 'int'
Error while flushing and closing output
Error while flushing and closing output(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

# bw2_global
$ ll bowtie_results/bwt2_global/M0_2_HIC_201222/
total 8872956
drwxr-xr-x  2 niuyw hesm       4096 Jan 18 13:48 ./
drwxr-xr-x 14 niuyw hesm       4096 Jan 18 13:48 ../
-rw-r--r--  1 niuyw hesm 2657988384 Jan 18 15:45 M0_2_HIC_201222_R1_mm10.bwt2glob.bam
-rw-r--r--  1 niuyw hesm 1587585147 Jan 18 15:45 M0_2_HIC_201222_R1_mm10.bwt2glob.unmap.fastq
-rw-r--r--  1 niuyw hesm 2911391811 Jan 18 15:44 M0_2_HIC_201222_R2_mm10.bwt2glob.bam
-rw-r--r--  1 niuyw hesm 1929929907 Jan 18 15:44 M0_2_HIC_201222_R2_mm10.bwt2glob.unmap.fastq

# one read of R2 has different quality and sequence length
$ cat M0_2_HIC_201222_R2_mm10.bwt2glob.unmap.fastq | paste - - - - | awk -F"\t" '{ if (length($2) != length($4)) print $0 }' |  tr '\t' '\n'
@E00513:215:HFFTTCCX2:1:1123:29762:21403 2:N:0:TGACCA+AGATCTCG
TTTTGGAAACTGTGAACTATTCAAATGCTAATTTTTAAGATGATCTCCATCTGACTTAGGCAAAGAAGGGCTCAGAATGGGGACATGGTGTTAAGATGTGTAAAGAACAGATGTCTTGCAAATCAACTTCTAAAATATTTCTTAGATATC
+
AAFF7JFFJJAJFJJJJFFJAJF<FFJFJJJJJJJJFJJJJFFFJJJJAJJJ

# bwt2_local
$ ll bowtie_results/bwt2_local/M0_2_HIC_201222/
total 3193208
drwxr-xr-x  2 niuyw hesm       4096 Jan 18 15:45 ./
drwxr-xr-x 14 niuyw hesm       4096 Jan 18 16:15 ../
-rw-r--r--  1 niuyw hesm  314811000 Jan 18 15:58 M0_2_HIC_201222_R1_mm10.bwt2glob.unmap_bwt2loc.bam
-rw-r--r--  1 niuyw hesm 1243538219 Jan 18 15:45 M0_2_HIC_201222_R1_mm10.bwt2glob.unmap_trimmed.fastq
-rw-r--r--  1 niuyw hesm  117696784 Jan 18 15:49 M0_2_HIC_201222_R2_mm10.bwt2glob.unmap_bwt2loc.bam
-rw-r--r--  1 niuyw hesm 1593759119 Jan 18 15:45 M0_2_HIC_201222_R2_mm10.bwt2glob.unmap_trimmed.fastq

# empty line of R1 Fastq
$ sed -n '1113,1118p' M0_2_HIC_201222_R1_mm10.bwt2glob.unmap_trimmed.fastq
@E00513:215:HFFTTCCX2:1:1101:14072:2452 1:N:0:TGACCA+AGATCTCG

+

@E00513:215:HFFTTCCX2:1:1101:15453:2452 1:N:0:TGACAA+AGATCTCG
GAGTGACTGGGTGTCTCCTGTTAGACATATACATGGGGGGTAAGTGGTACCCATGAATCTCAACAACATGGTTGTTTAAACAAGACCTGAAAAATAACACCAGGTTCATCTGTGGATGGCGGCGCCATCCCTTCAAGGAGGGCTATCTGC

# empty line of R2 Fastq
$ sed -n '4217,4225p' M0_2_HIC_201222_R2_mm10.bwt2glob.unmap_trimmed.fastq
@E00513:215:HFFTTCCX2:1:1101:15574:3436 2:N:0:TGACCT+AGATCTCG

+

@E00513:215:HFFTTCCX2:1:1101:15716:3436 2:N:0:TGACCT+AGATCTCG
TATTAATGCGTTCAAATTCTAAGGACAGATTGCCATATTACAACCCGCAGACAGAAATATTTTAGATGTAGCACATGTGTGGCCTGGTTATTCCGCCCCGTCGGATTGATCTGGTCTTACAAAGCATGGCTTCCTGGAATTCTTTGGCTG
+
AA<-<F-----77<7-F-<-<J<<-FFJF-F-<<-F-<<F---F-7-<J-7F------<A<7FJ7J<--<7A-<A--FA-F-----------A7--AJ7F-7-----<-7<----7AF7A---77<-7--A----7--7-7AF<AF<-77

The config file looked like this:

# Please change the variable settings below if necessary

#########################################################################
## Paths and Settings  - Do not edit !
#########################################################################

TMP_DIR = tmp
LOGS_DIR = logs
BOWTIE2_OUTPUT_DIR = bowtie_results
MAPC_OUTPUT = hic_results
RAW_DIR = rawdata

#######################################################################
## SYSTEM AND SCHEDULER - Start Editing Here !!
#######################################################################
N_CPU = 20
LOGFILE = hicpro.log

JOB_NAME = hicpro_210117
JOB_MEM = 
JOB_WALLTIME = 
JOB_QUEUE = Fat
JOB_MAIL = xiaohuwangwang@qq.com

#########################################################################
## Data
#########################################################################

PAIR1_EXT = R1
PAIR2_EXT = R2

#######################################################################
## Alignment options
#######################################################################

MIN_MAPQ = 10

BOWTIE2_IDX_PATH = /home/niuyw/RefData/Mus_musculus/Bowtie2Index
BOWTIE2_GLOBAL_OPTIONS = --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder
BOWTIE2_LOCAL_OPTIONS =  --very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder

#######################################################################
## Annotation files
#######################################################################

REFERENCE_GENOME = mm10
GENOME_SIZE = chrom_mm10.sizes

#######################################################################
## Allele specific analysis
#######################################################################

ALLELE_SPECIFIC_SNP = 

#######################################################################
## Capture Hi-C analysis
#######################################################################

CAPTURE_TARGET =
REPORT_CAPTURE_REPORTER = 1

#######################################################################
## Digestion Hi-C
#######################################################################

GENOME_FRAGMENT = mboi_mm10.bed
LIGATION_SITE = GATCGATC
MIN_FRAG_SIZE = 
MAX_FRAG_SIZE = 
MIN_INSERT_SIZE = 
MAX_INSERT_SIZE = 

#######################################################################
## Hi-C processing
#######################################################################

MIN_CIS_DIST =
GET_ALL_INTERACTION_CLASSES = 1
GET_PROCESS_SAM = 0
RM_SINGLETON = 1
RM_MULTI = 1
RM_DUP = 1

#######################################################################
## Contact Maps
#######################################################################

BIN_SIZE = 40000 100000 250000
MATRIX_FORMAT = upper

#######################################################################
## Normalization
#######################################################################
MAX_ITER = 100
FILTER_LOW_COUNT_PERC = 0.02
FILTER_HIGH_COUNT_PERC = 0
EPS = 0.1

I checked the input Fastqs and found no errors.

I do not know why the *R2_mm10.bwt2glob.unmap.fastq and *_unmap_trimmed.fastq got corrupted.

The issue was also reported before issue 381, but it was not solved.

Hope the log information above is useful. Any comments or suggestions would be highly appreciated.

nservant commented 3 years ago

Hi,

Thanks for reporting this issue. This is the first time I'm seeing that. So here, you are presenting two different things ;

If you manage to generate a small fastq files, I can try to reproduce your issue. Thanks Nicolas

YiweiNiu commented 3 years ago

Hi Nicolas,

Thank you for your reply.

Could you check if this read is well formated in the original input fastq file ? if this is not the last read in the file for instance, to be sure that bowtie did not simply stop writting at some point ?

The original input fastq files were complete, as validated by validatefastq.

I re-ran this sample without changing any parameters, and it worked now.

I guess there was something wrong when running bowtie2, which may be induced by unknown system error.

Thank you again for your time and effort.