open2c / cooltools

The tools for your .cool's
MIT License
140 stars 51 forks source link

Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe' #345

Closed cgirardot closed 2 years ago

cgirardot commented 2 years ago

I am trying to :

cooltools random-sample -c 228878289 -p 6 HiC_2-4h.5Kb-bin-matrix.transRemoved.cool HiC_2-4h.5Kb-bin-matrix.transRemoved.downsampled.cool

but I am getting:

Traceback (most recent call last):
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/bin/cooltools", line 8, in <module>
    sys.exit(cli())
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/cooltools/cli/sample.py", line 72, in random_sample
    api.sample.sample(
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/cooltools/api/sample.py", line 116, in sample
    cooler.create_cooler(out_clr_path, clr.bins()[:], iter(pipeline), ordered=True)
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/cooler/tools.py", line 148, in __iter__
    return iter(self.run())
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/cooler/tools.py", line 208, in run
    return self.map(pipeline, self.keys)
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/multiprocess/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/g/funcgen/gbcs/public/software/conda/envs/cooltools-0.5.1/lib/python3.8/site-packages/multiprocess/pool.py", line 771, in get
    raise self._value
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

I am not sure why my cool has float (it is raw data so these numbers are integers).

Thank you

gfudenberg commented 2 years ago

I am unable to replicate this error with the test.mcool in open2c_examples in a conda environment from the open2c enviornment.yml. Can you provide some more details as to how your cooler was created, etc?

cgirardot commented 2 years ago

Hi, I used hicExplorer (3.6) to build the hic matrices on multiple tech rep (ie lane) and 2 biol rep. Here, all reps (bio and tech) are merged to a unique matrice (summed up). I have then kept only the main chromosomes and also removed the inter-chromosomal contacts using hicAdjustMatrix. Finally, cool conversion uses hicConvertFormat.

I can share a matrix if it helps

thx

gfudenberg commented 2 years ago

can you double check the dtypes of the converted matrix and print what cooler.info() gives? I'm wondering if hicConvertFormat casts to floats.

cgirardot commented 2 years ago

I am not sure how to check the dtypes. How can I check ?

Also, I think I was wrong about the steps: due to this bug I had to prune the matrix myself using this piece of code:

hicConvertFormat MY_H5 -> MY_COOL
cooler dump --join MY_COOL | awk '$1==$4 {print $0}' | gzip > MY_BEDPE
cooler load --assembly dm6 -f bg2 chr_file:5000 MY_BEDPE MY_TRIMMED_COOL

cooler info :

{
    "bin-size": null,
    "bin-type": "variable",
    "creation-date": "2022-03-24T12:14:39.597384",
    "format": "HDF5::Cooler",
    "format-url": "https://github.com/mirnylab/cooler",
    "format-version": 3,
    "generated-by": "HiCMatrix-16",
    "generated-by-cooler-lib": "cooler-0.8.11",
    "genome-assembly": "unknown",
    "nbins": 25148,
    "nchroms": 6,
    "nnz": 26246396,
    "storage-mode": "symmetric-upper",
    "sum": 377180269.0,
    "tool-url": "https://github.com/deeptools/HiCMatrix"
}
gfudenberg commented 2 years ago

you can inspect the individual pixels in the cooler if you load it interactively clr = cooler.Cooler('mycooler.cool') then (clr.pixels()[:3]).dtypes

image
cgirardot commented 2 years ago

thx for the piece of code. The count slot is indeed float64, which I guess come from the previous manipulation steps. I am pretty sure these numbers are int. Is there a way I can check my numbers are all integers and then re-save this cool file with the right type?

Thank you

gfudenberg commented 2 years ago

This is a typical way with pandas dataframes: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html

closing this for now, but feel free to re-open (though this may be better addressed in HicExplorer issues as this looks like an issue with hicConvertFormat)