open2c / cooler

A cool place to store your Hi-C
https://open2c.github.io/cooler
BSD 3-Clause "New" or "Revised" License
204 stars 50 forks source link

TypeError '<' when using zoomify #198

Closed astrovsky01 closed 4 years ago

astrovsky01 commented 4 years ago

I'm using

cooler zoomify -r 10000 -o data/Li_et_al_2015.mcool data/Li_et_al_2015.cool

and I'm returning

Traceback (most recent call last):
  File "/opt/conda/bin/cooler", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/cooler/cli/zoomify.py", line 146, in zoomify
    agg=agg)
  File "/opt/conda/lib/python3.6/site-packages/cooler/reduce.py", line 679, in zoomify_cooler
    resn, pred, mult = get_multiplier_sequence(resolutions, base_resolutions)
  File "/opt/conda/lib/python3.6/site-packages/cooler/reduce.py", line 366, in get_multiplier_sequence
    resn = np.array(sorted(bases.union(resolutions)))
TypeError: '<' not supported between instances of 'NoneType' and 'int'

The command works with other cool files I'm using, and I'm currently testing this with data I found online, so I don't know its structure too well. What is going wrong here?

sergpolly commented 4 years ago

what's the resolution of the input cooler data/Li_et_al_2015.cool ? could you run cooler info data/Li_et_al_2015.cool on it ?

sergpolly commented 4 years ago

also, it looks like you're requesting a single resolution for zoomify with -r 10000 , maybe cooler coarsen is more appropriate tool for what you're trying to achieve ?

astrovsky01 commented 4 years ago

Well I think the first issue is this...

"bin-type": "variable", "bin-size": null, "storage-mode": "symmetric-upper", "nchroms": 1, "nbins": 103, "sum": 24239652.493847895, "nnz": 402, "genome-assembly": "unknown", "creation-date": "2019-10-25T13:50:07.876548", "format-version": 3, "format": Traceback (most recent call last): File "/opt/conda/bin/cooler", line 8, in sys.exit(cli()) File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(args, *kwargs) File "/opt/conda/lib/python3.6/site-packages/cooler/cli/_util.py", line 197, in decorated func(args, **kwargs) File "/opt/conda/lib/python3.6/site-packages/cooler/cli/info.py", line 62, in info json.dump(attrs_to_jsonable(dct), f, indent=4) File "/opt/conda/lib/python3.6/json/init.py", line 179, in dump for chunk in iterable: File "/opt/conda/lib/python3.6/json/encoder.py", line 430, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/opt/conda/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict yield from chunks File "/opt/conda/lib/python3.6/json/encoder.py", line 437, in _iterencode o = _default(o) File "/opt/conda/lib/python3.6/json/encoder.py", line 180, in default o.class.name) TypeError: Object of type 'bytes' is not JSON serializable

sergpolly commented 4 years ago

"bin-type": "variable", "bin-size": null,

yup ... - that looks very very special... variable bin sizes... @nvictus would be the best one to comment of course

astrovsky01 commented 4 years ago

I pulled it from a group of test files, since I need to have a method of automatically massaging data based on datatypes and user inputs. I'm glad to know it wasn't the command, but the dataset that's the issue, because I am just trying to make the command essentially

for <user file>:
    cooler zoomify -r  <user input resolution> -o <modified user file name> <user file name>

Then upload that to higlass

sergpolly commented 4 years ago

yeah - if you're not necessarily interested in that particular cooler-file/dataset - i.e. you are trying to use something for demonstration purposes, than I would just move on and never touch that file again ...

If you are still interested in it - I would start exploring the data in it, by trying to dump it into a text file:

"nnz": 402,

suggests that it has only 402 non zero values - so it is very small and can be easily explored as a whole ... here is the link to the dumping command https://cooler.readthedocs.io/en/latest/cli.html#cooler-dump

PS if I recall correctly zoomifying (and thus browsing in higlass) of coolers with variable size bins is not really supported - because coarsening of variable bins isn't well defined , but again @nvictus would know for sure

astrovsky01 commented 4 years ago

I saw a PR saying it worked better now, but I think this should be ok for the moment. I appreciate the help!

nvictus commented 4 years ago

Well I think the first issue is this...

That first issue is a string encoding issue. The format attribute was stored as bytes instead of unicode, probably because the file was created manually (or by an old cooler version) in Python 2.

This was fixed in #180 so the info should print if you upgrade to 0.8.7.

nvictus commented 4 years ago

if I recall correctly zoomifying (and thus browsing in higlass) of coolers with variable size bins is not really supported

Coarsening should work (zoomify may not), but the interpretation of "resolution" is that the base resolution of a variable-sized bin map is 1 and the coarsened resolutions are groupings of adjacent bins. So k = 10,000 would try to group consecutive groups of 10,000 bins, not bins of 10,000bp

nvictus commented 4 years ago

Closing as the original issue is fixed in #180