janfreyberg / superintendent

Practical active learning in python
https://superintendent.readthedocs.io
189 stars 18 forks source link

Combining superintendent with Plotly Dash #75

Open zndr27 opened 2 years ago

zndr27 commented 2 years ago

This is a really cool package. I'm curious if you have considered making superintendent a community component for Dash, or incorporating it into a similar tool.

Based on your talk at PyData I feel like Dash would solve a lot of the problems you wanted to implement solutions for.

I am a bit biased because I use Dash in my work and am currently working on developing active learning solutions for my research. I was going to implement a labeling GUI from scratch, but a google search of existing solutions led me here. I really like your work.

Let me know what you think. I would love to help implement something like this if you're interested and would like assistance.

Thank you!

Zander

janfreyberg commented 2 years ago

Hi Zander,

thanks for getting in touch and I'm glad you are finding use in the package! I did consider adding plotly dash support, but have since decided that I would move away from this. The reason to add it was, for me, serving labelling interfaces without the notebook interface. However, the voila library has since addressed this. It basically serves notebooks as web pages (with a python backend running, similar to Dash).

I can see that there is still a case for supporting Dash especially if you're working in an environment in which this is common and well supported - what would the benefit be for you? Would you want to place a superintendent labelling widget into a larger Dash application?

For context, I am actually in the process of splitting most UI elements into a separate library (ipyannotations). This might actually make it easier to extend superintendent to support Dash, but would require someone to implement the annotation UI elements in Dash. I can see an example of something like this here: https://dash.gallery/dash-image-annotation/

But there doesn't seem to be a suite of tools designed for different annotation tasks. It would be cool to see that, especially if it implements a standard interface (similar to ipyannotations). However, I've not worked with Dash in many years now, so I am definitely not the person to do this.

Sorry for the long response but let me know what you think.

Jan

zndr27 commented 2 years ago

Sorry for the late reply! I was traveling this weekend.

I looked a bit into your background and saw that we have similar research interests. I'm an MD/PhD student interested in neuroradiology and neuroimaging. I'm doing my PhD in a lab that does MRSI neuroimaging research. Our main clinical applications are for brain tumors but we are also studying applications for epilepsy and depression. Two of my major research topics are (1) transfer learning / domain adaptation techniques for neural-network-based MRSI quantification and (2) active learning and unsupervised learning techniques for neural-network-based MRSI quality control.

To assist with my research I am developing an application for visualizing neuroimaging data and comparing analysis results. Currently it can display structural MRI and MRSI data. The application is built on top of dash, dash-slicer, and dash-bootstrap-components. I've attached a screenshot of part of the app here.

onix_viz drawio

Currently I’m working on evaluating different state-of-the-art deep active learning methods on a pre-labeled MRSI dataset. But eventually I want to build a human-in-the-loop active learning system for MRSI quality control. This would include a frontend, based on my dash application, for generating quality labels for MRSI spectra. It would also include a distributed labeling backend so that labeling can be done by MRSI experts across different institutions. I feel like superintendent would work well for this project.

Emmaneuelle Gouillart and others have developed tools for interactive image visualization/annotation in dash, such as dash-slicer and dash-canvas. I think that if they were to collaborate on superintendent then dash could be used as another frontend for labeling in addition to ipyannotations.

I feel like ipyannotations would be great for people working on developing active learning techniques who want to experiment quickly in their jupyter notebooks, but dash would be ideal for developing GUI’s, for expert labelers, which are customized for the specific domain.

It seems like existing active learning solutions must be built entirely from the ground up. I think it would be great for the community if there was a de facto open source framework for active learning that would provide:

  1. Interface for models developed in different frameworks (e.g. tensorflow, pytorch, jax).
  2. Interface for custom acquisition functions.
  3. Frontend with options for labeling different types of data.
  4. Backend for distributed labeling.

I feel like superintendent could become the de facto framework for building human-in-the-loop active learning systems if improvements are made to each of these areas. Considering (2), many state-of-the-art active learning acquisition methods are not based on output probabilities, so superintendent would need to provide more flexibility for custom acquisition functions. Considering (3) I think it would be useful to have Dash as another frontend for the reasons stated above. Considering (4), support for kubernetes over docker-compose could be added for improved scalability. Dash apps have been demonstrated to work well with kubernetes, so the web app dockerfile could be built with dash. The model training dockerfile could be built with a framework like FastAPI that could talk with the web app dockerfiles.

Clearly this would take a lot of work, but I’m sure there would be a lot of interest from the community on building a solution like this.

Let me know your thoughts!

zndr27 commented 2 years ago

Sorry for the long winded message haha. Maybe it would be nice to talk on zoom sometime?

Another thing to consider about (4) is how labels are combined across different oracles. For example, maybe we want 3 oracles to generate a "label" for each sample and write some logic for how the labels get combined. Depending on the nature of the label this logic could be more or less complicated. For instance, if it is just a binary label then maybe we would take the majority label amongst the oracles. However, if the label is a binary segmentation mask then maybe the output would be the average of the oracles' masks thereby generating a probability segmentation mask.

janfreyberg commented 2 years ago

Oh nice! That's a really nice looking, fully featured interface. Also great to see you using dash bootstrap components as it's developed by a friend :)

I definitely agree with the vision of developing software that can act as an interface to several different options of handling the labelling. My main hesitation is introducing additional complexity to a library that at times in the past has gotten pretty out of hand :) However, especially since all labelling is now stored with a database backend, it may actually be easy to produce a separate library (e.g. superintendent-dash, or something like that), which simply reads from the same database.

That would mean that active learning (training models in the different frameworks you mentioned, acquisition functions, etc), could continue to be handled by this python library, but the labelling frontend can be served using dash. I don't think the active learning backend even needs to have a FastAPI web interface in that case, as it's all handled via the database.

I would be keen to define some sort of standardised API for dash labelling frontend components, as that makes it much easier to support third party components (something I've tried to encourage with ipyannotations).

What do you think? There would have to be quite a bit of planning (and probably a bit more thinking around things like the database schema), but this would be very cool :)

Regarding handling multiple labels per datapoint - yes, this is also something on my mind. My intuition is to simply return all labels as a list and let the user write the code to handle reconciliation, simply because there are so many idiosyncrasies when it comes to machine learning!

There are a few other things I am also thinking about - one is whether we can actually use model proposals for labels once a model is "good enough". For segmentation, you could imagine overlaying a model's proposed segmentation mask, and the user only needs to edit it. Would love to hear your thoughts on this.

If this approach makes sense to you it would be great to have a chat on zoom! Thanks again for your interest!