unpack requires a buffer of 4 bytes

LMH0066 commented 1 year ago

Hi there, When I use FANC to read public Hi-C data, it doesn't seem to work.

>>> c = fanc.load('GSM4383608_hippocampus-p001-cb_068.contacts.hic@10kb')
>>> c.matrix(norm=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda3/envs/see/lib/python3.8/site-packages/fanc/matrix.py", line 1023, in matrix
    row_regions, col_regions, matrix_entries = self.regions_and_matrix_entries(key,
  File "/root/miniconda3/envs/see/lib/python3.8/site-packages/fanc/matrix.py", line 963, in regions_and_matrix_entries
    row_regions, col_regions, edges_iter = self.regions_and_edges(key, *args, **kwargs)
  File "/root/miniconda3/envs/see/lib/python3.8/site-packages/fanc/matrix.py", line 869, in regions_and_edges
    row_regions = list(row_regions)
  File "/root/miniconda3/envs/see/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 868, in _region_iter
    norm = self.normalisation_vector(chromosome)
  File "/root/miniconda3/envs/see/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 757, in normalisation_vector
    JuicerHic._skip_to_normalisation_vectors(req)
  File "/root/miniconda3/envs/see/lib/python3.8/site-packages/fanc/compatibility/juicer.py", line 581, in _skip_to_normalisation_vectors
    n_vectors = struct.unpack('<i', req.read(4))[0]
struct.error: unpack requires a buffer of 4 bytes

Also, does FANC have a method to convert juicer Hi-C into something similar to cooler pixel? This may be helpful for me to further process the data. Thank you so much in advance!!

kaukrise commented 1 year ago

Hi, which version of FAN-C are you using?

LMH0066 commented 1 year ago

I installed FAN-C for version 0.9.25 using poetry.

kaukrise commented 1 year ago

Could you send me a link to the public Hi-C file that is causing the issue? Then I can try to reproduce the error on my machine. Thanks!

Regarding your other question: there is a roundabout way to convert Juicer files into Cooler files with FAN-C, if that is what you are asking, but it is really inefficient. You're probably better off asking in the Juicer or Cooler forums! For completeness, though:

Convert the Juicer Hi-C to FAN-C format: fanc hic --deepcopy juicer.hic@<resolution> converted.fanc
Convert FAN-C to Cooler: fanc to-cooler converted.fanc hic.cool

LMH0066 commented 1 year ago

Thanks for your reply! I gain Hi-C from the NCBI website(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE162511). The dataset author combines all Hi-C data into one compressed file, so I can't put one Hi-C data URL for you, but you can download data through the below operation.

Regarding the second question: because processing Hi-C data is very sparse, the use of matrix representation for processing has inefficient. Cooler pixel representation is the best way to process data quickly for me.

kaukrise commented 1 year ago

Hi, sorry for the delay, but I was on holiday last week.

The file did not have any normalisation information, which I did not encounter before. Here is a beta version of FAN-C which works with the file you provided:

fanc-0.9.26b4.tar.gz

Also, your command above will try to create a whole-genome 10kb full matrix, which will take a very long time and consume a lot of memory. Since there are only a few contacts in the file, I would recommend using a much lower resolution:

import fanc

# use NONE here to ignore the default KR norm
hic = fanc.load("GSM4382149_cortex-p001-cb_001.contacts.hic@2.5mb@NONE")

# extract 2.5Mb matrix
m = hic.matrix()

LMH0066 commented 1 year ago

OK, I test it, and it works well! Thanks for your reply!

vaquerizaslab / fanc

unpack requires a buffer of 4 bytes #151