higlass / higlass-python

Python bindings to and Jupyter Notebook+Lab integration for the HiGlass viewer
http://docs-python.higlass.io/
MIT License
52 stars 12 forks source link

Refactor tileset design / plugins #129

Open manzt opened 1 year ago

manzt commented 1 year ago

Motivation

The existing approach of adding multiple tileset helpers (e.g., hg.cooler, hg.bigwig, etc.) to the hg.server facilitates the creation and addition of tilesets. Although this method is works "fine", it has some potential drawbacks and I'm not thrilled with the inconsistency of the API.

Issues

Confusing Usage / Naming Around Tilesets

Our present system exports all tileset helpers from a top-level flat namespace for convenience. However, it fails to distinguish "tilesets" from other API components associated with view configuration such as hg.view or hg.track. For context, this caused confusion in a code snippet I shared at ISMB conference, where the similar functions of hg.remote and hg.cooler were unclear.

A potential solution could involve grouping tilesets under a separate namespace like hg.tilesets.cooler, but I also see drawbacks in this approach (beyond being more verbose – see next).

Limited Scalability and Extensibility

We aim to simplify the implementation of custom tilesets, however, the current system further convolutes that folks actually have control over the server due to the differences in the API. The tileset helpers hide the server, but if you want a custom tileset you need to find the server. E.g.,

import higlass as hg
from my_tileset import Tileset

custom_ts = hg.server.add(Tileset(...))
builtin_ts = hg.cooler("./tmp/data.mcool")

Additionally adding nice tileset helpers requires making changes to higlass-python. I think the best user experience would be to let someone pip install custom-tileset and this now works like our builtin tilesets.

Ideas

I'm been mulling over what a plugin-system / registry for tilesets could look like. I think ideally the end-user API could look something like:

import higlass as hg

tileset = hg.server.add("./data.mcool", type="cooler")
tileset = hg.server.add(df, type="my-custom-pandas-tileset") # auto-detected from from a pip-install

class CustomTileset: ...

hg.server.add(CustomTileset(...)) # no type needed because it implements the tileset protocol

You could imagine a registry of tileset helpers per server instance that know how to handle an object:

class HiGlassServer:
    # ... 
    def add(self, obj, type=None):
        if is_tileset(obj):
            tileset = obj
        else:
            if type is None:
                plugin = next(filter(lambda plugin: plugin.handles(obj), self.registry))
            else: 
                plugin = self.registry.get(type)
            tileset = plugin.create(obj)

        if self._provider is None:
            self._provider = Provider().start(port=port)

        if port is not None and port != self._provider.port:
            self._provider.stop().start(port=port)

        if tileset.uid not in self._tilesets:
            server_resource = self._provider.create(tileset)
            self._tilesets[tileset.uid] = TilesetResource(server_resource)

        return self._tilesets[tileset.uid]

And then plugins could register themselves for the hg.server:

class TilesetPlugin: ...

hg.server.register("my-tileset", TilesetPlugin)
manzt commented 1 year ago

Another option would be to create a top-level hg.tileset (akin to hg.view and hg.track) that deals with the server behind the scenes:

import higlass as hg

tileset = hg.tileset("./data.mcool", type="cooler")
track = tileset.track("heatmap")
hg.view(track)
nvictus commented 1 year ago

The server is not a detail we can really hide, so I'd prefer a more explicit API like the first proposal. With the alternative one, most users would be surprised that simply creating a tileset is having server manipulation side effects behind the scenes.