open2c / cooltools

The tools for your .cool's
MIT License
140 stars 51 forks source link

Error when running cooltools random-sample #541

Open GMFranceschini opened 1 month ago

GMFranceschini commented 1 month ago

Hi, thank you for the amazing tool. I am encountering this error when downsampling a cool file, I was wondering if you could help me debug this.

I am running:

cooltools random-sample ${file}::resolutions/50000 -c 250000000 ${cools_path}${name}_subsampled_250m.cool

And I get:

INFO:root:fallback to serial implementation.
Traceback (most recent call last):
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/bin/cooltools", line 11, in <module>
    sys.exit(cli())
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/lib/python3.10/site-packages/cooltools/cli/sample.py", line 71, in random_sample
    api.sample.sample(
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/lib/python3.10/site-packages/cooltools/lib/common.py", line 541, in wrapper
    result = func(*args, **kwargs, map_functor=mymap)
  File "/mnt/ndata/gianmarco/miniforge3/envs/cools/lib/python3.10/site-packages/cooltools/api/sample.py", line 107, in sample
    frac = count / clr.info["sum"]
KeyError: 'sum'

I checked the output of cooler info and indeed there is not a sum field. Is that the problem? Should I generate one to make it work?

Phlya commented 1 month ago

Yes indeed, I think that's the problem. If you can add it yourself, that should solve it.

GMFranceschini commented 1 month ago

Thank you! Indeed one could do this, right?

sum_contacts = clr.pixels()[:]["count"].sum()
clr.info["sum"] = sum_contacts

I can submit a PR to handle this corner case if it is useful. Our mcool files have been generated with hic2cool so that might be the culprit.

Phlya commented 1 month ago

Indeed this could be the reason!

I am not sure off the top of my head whether this would actually store the value in the file... @nvictus ?

GMFranceschini commented 1 month ago

In the end, my solution was a bit more complicated, as np.int64 was causing me trouble re-writing the metadata slot (I am not sure adding "sum" on clr.info directly was working). Feel free to close the issue and let me know if I can contribute, I feel like computing the sum on the spot and adding it post downsampling would make sense.

def addSum_mcool(mcool_file, out_file):

    clr = cl.Cooler(mcool_file)
    c_sum = clr.pixels()[:]["count"].sum()
    metadata = dict()
    metadata["sum"] = int(c_sum)

    for key, value in clr.info.items():
        if isinstance(value, np.int64):
            metadata[key] = int(value)
        else:
            metadata[key] = value

    print("Sum of the matrix: ", metadata["sum"])
    pixel_mat = clr.pixels()[:]
    bins = clr.bins()[:]
    cl.create_cooler(out_file, bins=bins, pixels=pixel_mat, metadata=metadata)