js2264 / HiCExperiment

Importing and manipulating Hi-C data in R
http://js2264.github.io/HiCExperiment/
Other
9 stars 1 forks source link

Error in .normarg_seqnames2(seqnames, seqinfo) #7

Open julianeweller opened 7 months ago

julianeweller commented 7 months ago

Hello,

I've been trying to use HiCExperiment to analyze HiC data from the nf-core HIC pipeline and pairtools output. In both cases, I'm running into the same error when trying to import the cool or mcool files:

mcoolpath <- paste0(home, '/files/HIC/nfcore_mS/hicpro/mapping/D2.mcool')
coolpath <- paste0(home, '/files/HIC/nfcore_mS/hicpro/mapping/D2.1000000.cool')

cf <- CoolFile(coolpath)
hic <- import(cf, focus = 'chr13')

Do you know where this error comes from and how this could be fixed?

Error in .normarg_seqnames2(seqnames, seqinfo) : 
  'seqnames' contains sequence names with no entries in 'seqinfo'

Thank you!

julianeweller commented 7 months ago

if I'm importing the pairs file directly, I'm getting a step further but it also fails:

pairpath <- paste0(home, '/files/HIC/nfcore_mS/hicpro/mapping/D2.chr13.sorted.pairs.gz')
pairs_file <- PairsFile(pairpath)

Sys.setenv(VROOM_CONNECTION_SIZE = 1562144) 
hic <- import(pairs_file)

hic

GInteractions object with 13759077 interactions and 3 metadata columns: seqnames1 ranges1 seqnames2 ranges2 | frag1 frag2 distance

| [1] chr13 16000276 --- chr13 16017594 | UU 39 17318 [2] chr13 16000337 --- chr13 18968185 | UU 35 2967848 [3] chr13 16000350 --- chr13 16006475 | UU 35 6125 [4] chr13 16000350 --- chr13 16006475 | UU 34 6125 [5] chr13 16000350 --- chr13 19362605 | UU 35 3362255 ... ... ... ... ... ... . ... ... ... [13759073] chr13 114356534 --- chr13 114287587 | UU 32 68947 [13759074] chr13 114356566 --- chr13 113471315 | UU 32 885251 [13759075] chr13 114356813 --- chr13 114356349 | UU 30 464 [13759076] chr13 114356813 --- chr13 114356349 | UU 30 464 [13759077] chr13 114356813 --- chr13 114356349 | UU 30 464 ------- regions: 13326249 ranges and 0 metadata columns seqinfo: 1 sequence from an unspecified genome; no seqlengths

plotMatrix(hic)

Error: subscript contains invalid names

miachom commented 5 months ago

@julianeweller Hi, did you get to solve this issue? I am facing this problem but only when I pooled samples together to generate merged cool files and when I used the merged cool files. If I don't merge them they all work fine and I can analyze them. I am not sure if this is because of Grange problems due to incompatible "seqnames" while loading or something else. Did you find a solution to this? Thanks

julianeweller commented 5 months ago

Hey @miachom, no I was not able to solve the issue. Asking this question on the hic nextflow chat also didn't help. But thanks for letting me know that running the samples separately worked (I ran them in batch but still had one sample per cool file).

js2264 commented 5 months ago

@julianeweller was your D2.1000000.cool file also a merged file? Can you try to import on a specific chromosome section (i.e. import(x, focus = 'chr13:1-10000000'))?