nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
372 stars 181 forks source link

gensub: third argument treated as 1 #615

Open KarimKoleilat opened 5 months ago

KarimKoleilat commented 5 months ago

Hello Nicholas,

I'm running HiC-Pro on a cluster (seadragon/redhat) and it seems I'm encountering an error as listed in the title for all steps starting with mapping step 1 until mapped_2hic_fragments in the output file after submitting step 1 script.

I'm using the complete IMR90_rep1 dataset.

my Bowtie2 bwt2pairs.pairstat file: Total_pairs_processed 65090522 100.0 Unmapped_pairs 2548522 3.915 Low_qual_pairs 16455856 25.281 Unique_paired_alignments 37175090 57.113 Multiple_pairs_alignments 0 0.0 Pairs_with_singleton 8911054 13.69 Low_qual_singleton 0 0.0 Unique_singleton_alignments 0 0.0 Multiple_singleton_alignments 0 0.0 Reported_pairs 37175090 57.113

and my R1 mapstat:

HiC-Pro Mapping Statistics

SRR400267_GSM862724_IMR90_rep1/SRR400267_GSM862724_IMR90_rep1_1_hg19.mapstat

total_R1 65090522 mapped_R1 59333874 global_R1 56736642 local_R1 2597232

It seem that something is occurring that preventing aligned reads from being processed into valid pairs -> hic-results/data/SRR.....RSstat file:

Hi-C processing

Valid_interaction_pairs 0 Valid_interaction_pairs_FF 0 Valid_interaction_pairs_RR 0 Valid_interaction_pairs_RF 0 Valid_interaction_pairs_FR 0 Dangling_end_pairs 0 Religation_pairs 0 Self_Cycle_pairs 0 Single-end_pairs 0 Filtered_pairs 0 Dumped_pairs 37175090

my rawdata file structure is the way it's supposed to be (each sample in its own directory - I have 1 sample).

I do not see anything wrong/different with my config-hicpro.txt file.

This issue has been mentioned previously by user jfass and he mentioned something regarding symlincs but he was using version 2.11 (3.10 here) and my run did not exit like his did.

Any help is much appreciated.

If you need more information or clarification please let me know.

Thank you,

Karim

nservant commented 5 months ago

Hi Karim, I think the data structure looks good, otherwise, you will not have the mapping. In your example, all read pairs are flagged as "Dumped_pairs", meaning that it was not able to map the pairs on the restriction fragments. Could you check your reference names ? for instance chromosome names in your bam, and in your restriction fragment file ?

KarimKoleilat commented 5 months ago

Thank you Nicholas. I think this makes sense.

Restriction Fragment bed file has "chr1" whereas reference files has "1" annotation instead.

Snippets below.

Do you recommend changing the restriction file to match the "1" notation or the hg19 reference file to match the bed file with "chr1" annotation? i.e. does it matter which?

Best,

Karim

cat hg19.fasta | grep ">"

1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 2 dna:chromosome chromosome:GRCh37:2:1:243199373:1 3 dna:chromosome chromosome:GRCh37:3:1:198022430:1 4 dna:chromosome chromosome:GRCh37:4:1:191154276:1 5 dna:chromosome chromosome:GRCh37:5:1:180915260:1 6 dna:chromosome chromosome:GRCh37:6:1:171115067:1 7 dna:chromosome chromosome:GRCh37:7:1:159138663:1 8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 9 dna:chromosome chromosome:GRCh37:9:1:141213431:1 10 dna:chromosome chromosome:GRCh37:10:1:135534747:1 11 dna:chromosome chromosome:GRCh37:11:1:135006516:1 12 dna:chromosome chromosome:GRCh37:12:1:133851895:1 13 dna:chromosome chromosome:GRCh37:13:1:115169878:1 14 dna:chromosome chromosome:GRCh37:14:1:107349540:1 15 dna:chromosome chromosome:GRCh37:15:1:102531392:1 16 dna:chromosome chromosome:GRCh37:16:1:90354753:1 17 dna:chromosome chromosome:GRCh37:17:1:81195210:1 18 dna:chromosome chromosome:GRCh37:18:1:78077248:1 19 dna:chromosome chromosome:GRCh37:19:1:59128983:1 20 dna:chromosome chromosome:GRCh37:20:1:63025520:1 21 dna:chromosome chromosome:GRCh37:21:1:48129895:1 22 dna:chromosome chromosome:GRCh37:22:1:51304566:1 X dna:chromosome chromosome:GRCh37:X:1:155270560:1 Y dna:chromosome chromosome:GRCh37:Y:2649521:59034049:1 MT gi|251831106|ref|NC_012920.1| Homo sapiens mitochondrion, complete genome GL000207.1 dna:supercontig supercontig::GL000207.1:1:4262:1 GL000226.1 dna:supercontig supercontig::GL000226.1:1:15008:1 etc... NC_007605 NC_007605.1 "EBV type 1"

head annotation/HindIII_resfrag_hg19.bed chr1 0 16007 HIC_chr1_1 0 + chr1 16007 24571 HIC_chr1_2 0 + chr1 24571 27981 HIC_chr1_3 0 + chr1 27981 30429 HIC_chr1_4 0 + chr1 30429 32153 HIC_chr1_5 0 + chr1 32153 32774 HIC_chr1_6 0 + chr1 32774 37752 HIC_chr1_7 0 + chr1 37752 38369 HIC_chr1_8 0 + chr1 38369 38791 HIC_chr1_9 0 + chr1 38791 39255 HIC_chr1_10 0 +

nservant commented 5 months ago

This is really up to you. It doesn't matter as soon as all your annotation files have the same nomenclature. Do not forget also the chromose size file ;)