nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
384 stars 182 forks source link

Merge interactions fails in FithHiCHIP #533

Open pna059 opened 2 years ago

pna059 commented 2 years ago

Hi, I have analyzed Hi-C data using HiC-Pro followed by FitHiChIP with recommended settings. There is a large number of interactions, but at the step of merging adjacent loops, I am getting this error message:

******* applying merge filtering on the FitHiChIP significant interactions ******

****** Merge filtering of adjacent loops is enabled *****
***** within function of merged filtering - printing the parameters ***
*** bin_size:  20000
*** headerInp:  1
*** connectivity_rule:  8
*** TopPctElem:  100
*** NeighborHoodBinThr:  40000
*** QValCol:  26
*** PValCol:  25
*** SortOrder:  0
OutDir:  FitHiChIP_leaf1_20k/FitHiChIP_ALL2ALL_b20000_L10000_U2000000/ICE_Bias/FitHiC_BiasCorr/Merge_Nearby_Interactions
list of chromosomes:  ['chr1H', 'chr2H', 'chr3H', 'chr4H', 'chr5H', 'chr6H', 'chr7H']
Processing the chromosome:  chr1H
Traceback (most recent call last):
  File "./src/CombineNearbyInteraction.py", line 638, in <module>
    main()
  File "./src/CombineNearbyInteraction.py", line 245, in main
    CurrChrDict.setdefault(curr_key, Interaction(int(linecontents[CCCol-1]), float(linecontents[PValCol - 1]), float(linecontents[QValCol - 1])))
ValueError: invalid literal for int() with base 10: '11.757594'
----- Applied merged filtering (connected component model) on the adjacent loops of FitHiChIP
SORRY !!!!!!!! FitHiChIP could not find any statistically significant interactions after applying merge filtering on the generated set of loops !!
Option 1: use significant loops without merge filtering

What could be the problem? Is the chromosome format including "H" supported in this step?

Thank you Pavla

pna059 commented 2 years ago

I have solved the issue by editing File "./src/CombineNearbyInteraction.py", line 245

CurrChrDict.setdefault(curr_key, Interaction(int(linecontents[CCCol-1]), float(linecontents[PValCol - 1]), float(linecontents[QValCol - 1])))

to

CurrChrDict.setdefault(curr_key, Interaction(float(int(linecontents[CCCol-1])), float(linecontents[PValCol - 1]), float(linecontents[QValCol - 1])))

I have got another, hopefully the last error regarding tbx index (my genome is a large plant genome):

[E::hts_idx_check_range] Region 537189999..537190001 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
tbx_index_build failed: /auto/budejovice1/home/pavlan/FitHiChIP_leaf1_20k/FitHiChIP_ALL2ALL_b20000_L10000_U2000000/ICE_Bias/FitHiC_BiasCorr/Merge_Nearby_Interactions/FitHiChIP_leaf1_20k.interactions_FitHiC_Q0.05_MergeNearContacts_WashU.bed.gz

It would be good to consider the possibility of users working with such genomes and include the -c option with indexing in cases where the chromosome length limit is surpassed.