Closed mmilevskiy closed 3 years ago
Hi Michael Nothing special in your config. Could you show me the first lines of a paired fastq files, just to check the read names. And please, which version are you using ? and what is the exact error message ? Thanks
Hi Nicolas,
Here are the first few lines or the R1 and R2 files of a sample:
head -12 file1_R1.fastq @NB500916:806:HHY2KAFX2:1:11101:14965:1020 1:N:0:CGATGT TGCAGNCTCACATTCATTGGTTACCCAAGGAGGTCGCTGTGCTGTGGACAGCAGCTTAACTGCAGGGGACCAAATGAAACTGCAAGGTTCGTCCATCGATCGATGGACGAACCTAAGTTGAATTTTCTCATAAATTATAGTTGAATATAAA + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEAEAEEEE<AEEEEEEAEEEAEEA/EEEAEEEEEEEEEEA<AEEE/AEEEAEAEAEAE/EAE/E/ @NB500916:806:HHY2KAFX2:1:11101:10793:1020 1:N:0:CGATGT GAACTNCAAGTCTCTGAAAAAAGAAATTTAAGAAGATCTCAGAAGATAGAAAGATTTCCCATGCTCACGAATCAGGTTCGTCCATCGATCGATGGACGCACCTAGGCTCAGCGTTCTTTACTTCCTGTGGGTGAAAAGACATAAACTTAGT + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEE6EEEEEEEEEEEE/EEEEEEEEEEE<EEE</<AE/E/A//AE/<E//EA///E//</EA//A///E///E/<//6EAA/A//AEA/A</<AEE<E//A @NB500916:806:HHY2KAFX2:1:11101:6444:1020 1:N:0:CGATGT AACATNCGTTTTGTACTCAATGCCCATAAATCTGACACAGGTTCGTCCATCGATCGATGGACGAACCTATGGTCAGTCTCCTCCATTGTAGGGAGTCAGGGATGGTTCAATGTATATAAATTTTTAGATACAATAAAGCACATAACTAGAA + A6AAA#EEA<EEEAEEEEEEEEE/EEEEEEEEAEEEE/AEEAA<AE6<<EE<AE//<E</EE/EAE//E/E/E</EE</E/<E<E/E/A/E/E/E/E</EEEEEE<//<EAA//EEEAEEE<E/AA/EE<<<E<<EEE/AAE//<AA<EE<
head -12 file1_R2.fastq @NB500916:806:HHY2KAFX2:1:11101:14965:1020 2:N:0:CGATGT NNNAATGCTATTATAAATATGTTTCATCTCTGGAGGCAGGATATTTAGACCATATGATGGGTATGGGCTCAGTCATATAAGGAGTTCTTGTATAAGCATCTGCTCTATTTTATATTCAACTATAATTTATGAGAAAATTCAACTTAGGTTN +
@NB500916:806:HHY2KAFX2:1:11101:10793:1020 2:N:0:CGATGT NNNCAAAATAAAAGTAGATTTTTGCTGAGGGCCAATTATCTCCCCAGTTGTGTTCTTTCTTACACTAAGTTTATGTCTTTTCACCCACAGGAAGTAAAGAAAGCTGAGCCTAGGTTCGTCCATCGATCGATGGACGAACCTGATTCGTGA +
@NB500916:806:HHY2KAFX2:1:11101:6444:1020 2:N:0:CGATGT NNNAAAAGTAACTGGAAGATGAATGAGCATGTCCAATGACATACATGTATAACCATGTCAAAATGTAACCTATTATTTTATATGCTAACAAAGAACTGATTGTTTAAAAAACTAACTACGAGCTTTTCTCCATTTATGGAGATGACTGTAT +
Also, I don't see an error message I don't think. It is just the alarming rate of unmapped reads and that 2 bam files come from the mapping step instead of one merged bam for both the left and right read. The log says that 100% of the reads are unmapped which I think is the error, see below.
file1_R1_bowtie2.log file1_R2_bowtie2.log
HiC-Pro v 2.11.4
ok I see ! Usually, with digestion Hi-C, we are doing two rounds of mapping to deal with chimeric reads (spanning the ligation junction et the two interacting loci). But in your case, as you are doing Dnase Hi-C, we can't do that as there is no ligation motif.
My feeling is that you have a lot of chimeric reads in your data. This can be due to a fragment size distribution which is smaller than expected or by the fact that here you have very long reads (151bp).
My advice would be to :
--very-sensitive --end-to-end --reorder
could help you to rescue some alignments.To conclude, at least as a first trial, I would go for option 1 (reads trimming) Let me know if it works better Nicolas
Hi Nicolas,
Thanks for your suggestion. I am doing Omni-C which does indeed have a ligation motif:
AGGTTCGTCCATCGATCGATGGACGAACCT
You can see it in the above sequences, since the ligation motif is so larger, I think this is why they recommended doing 150bp sequencing instead of 75bp. I can re-try with relaxing bowtie2 parameters. The company originally recommended HiC-Pro, but now has their own pipeline, that I haven't been able to get up and running yet either.
https://github.com/dovetail-genomics/dovetail_tools
Thanks, Michael
Hi Michael
Thank you for the link. Indeed, they are using bwa now which is the only mapper able to do local mapping from the 5' end of reads. This is not something I will add to HiC-Pro. However, we can think about adding it in the the nf-core-hic pipeline (https://github.com/nf-core/hic) which is more flexible. Feel free to add an issue there.
But still, even if the ligation motif is very large, I think you should try to trim the reads to see if it improves the results. In practice, you do not necessarily want to read this ligation motif. This is more a control to see if the ligation works well. The only thing that you want is having the R1/R2 on different restriction fragments (or sufficiently distant in the case the DNAse Hi-C). I already analysed very nice samples with just 1% of reads spanning the junction.
Best
Hi Nicolas,
I tried two of the steps: : Reducing the bowtie2 stringency and : trimming the reads to 80bp
This increased mapping, see attached. After going through a few errors I used singularity to run HiC-Pro and everything appears to be ok I believe. I can generate the 2 plots with 53.3% reported pairs from the read trimmed fastq. Does the output all look ok now then? I might go with 75bp PE reads next time then.
plotHiCFragment_sample1.pdf plotMappingPairing_sample1.pdf
file1_R1_val_1_bowtie2.log file1_R2_val_2_bowtie2.log
Michael
Hi Michael, Yes, it looks ok. I think you are using an old HiC-Pro version (or I hope so actually) as there a small typo in the R plots. Anyway, one last suggestion, would be to decrease the min mapping quality. Here it seems that you are filtering 30% of reads because of low mapQ. Best Nicolas
Hi Nicolas,
Ok great. I will test out a few variations of sequence trimming length and mapping quality. Yes, noticed the singularity was version 3, I should get the latest version installed on our servers.
Thanks again, think that resolved this!
Michael
Resolved.
Hi HiC-Pro Team,
I am consistently getting an error where the read1 and read2 files map separately. Bowtie then says that 100% of my reads are not paired and this leads to >40% unmapped. Is there an error somewhere in my config file or something that is preventing Hi-Pro from recognising the two files are paired sequencing?
Thanks,
Michael
DNase_config_motif_step1.txt
HiC-Pro -i /Fastq/Run1/ -o /pro_out/motif_trimming -c DNase_config_motif_step1.txt -s mapping