maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
104 stars 41 forks source link

Drawing labels in js #221

Closed mxposed closed 3 years ago

mxposed commented 3 years ago

Started implementing #187

Here's the first take:

TODO:

Algorithm for label coordinates: In python a mean coordinate is computed for each cluster, then bottom 70% of dots are selected based on distance to cluster and the mean is recomputed. It's possibly too heavy to do in JS, but I haven't tried. I implemented the mean coordinate, it didn't look good, so I switched to the median coordinate, which currently looks nice.

I'm open to any feedback, please advise how to continue

maximilianh commented 3 years ago

Wow, great, many thanks!

This should also make the Python code simpler.

Did you test this on a big dataset, like the ones with 1-2M cells? How long did the median calculation take in Javascript?

On Mon, Jun 21, 2021 at 5:56 AM Nikolay Markov @.***> wrote:

Started implementing #187 https://github.com/maximilianh/cellBrowser/issues/187

Here's the first take:

  • label coords are still loaded in JS, but not passed to the renderer
  • after coords are loaded, label coords are computed in JS and set to the renderer
  • currently label coords are computed as median coordinates of all dots (in python the logic is different, see below)
  • added UI combobox field to select which metadata field to draw labels for
  • no marker genes are drawn or attempted if current label field is not the config label field
  • support select by alt-click on cluster labels

TODO:

  • figure out what to do with heatmap (use config label field always or redraw based on UI label field)
  • save and load UI label field into URL parameter
  • stop loading label coords in JS
  • stop producing label coords in python

Algorithm for label coordinates: In python a mean coordinate is computed for each cluster, then bottom 70% of dots are selected based on distance to cluster and the mean is recomputed. It's possibly too heavy to do in JS, but I haven't tried. I implemented the mean coordinate, it didn't look good, so I switched to the median coordinate, which currently looks nice.

I'm open to any feedback, please advise how to continue

You can view, comment on, or merge this pull request online at:

https://github.com/maximilianh/cellBrowser/pull/221 Commit Summary

  • Draw labels in javascript, first take, #187

File Changes

Patch Links:

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/221, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TOLIDM7TNNYB4QT4SLTT2Z7VANCNFSM47AYOGZQ .

mxposed commented 3 years ago

No, I did not (yet). How best to get such a test dataset?

maximilianh commented 3 years ago

You could use the mouse-organogenesis dataset: https://hgwdev.gi.ucsc.edu/~max/nikolayMarkov/

maybe @matthewspeir has another idea? We could also test it here, so maybe don't mind for now.

Hey, it seems that you always compute the cluster midpoints in Javascript, is this correct? You are not using the precalculated ones at all? Is this correct?

Wouldn't it be easier to keep using the precalculated ones and only calculate them in Javascript if it's the non-default field? In this way, we'll save a little time when loading the dataset?

On Wed, Jun 23, 2021 at 7:15 PM Nikolay Markov @.***> wrote:

No, I did not (yet). How best to get such a test dataset?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mxposed commented 3 years ago

Thanks! I'll use that to test!

it seems that you always compute the cluster midpoints in Javascript, is this correct? You are not using the precalculated ones at all? Is this correct?

yes, this is how the current code works. I guess the idea was to move this part to Javascript altogether and remove it from python. The second option would be to precompute coordinates for all meta fields in python and just display them in Javascript.

Let me test the current code on a big dataset and report back, and we can decide.

maximilianh commented 3 years ago

I wonder if we shouldn't keep it a mix. This would mean minimal disruption for the default use case, while still covering the more advanced use case.

mxposed commented 3 years ago

Sounds good. I'll use precomputed coordinates for the main label and will compute them in JS for other labels

maximilianh commented 3 years ago

Yeah, that sounds safer to me, so people will not notice any change for the current labels but still have the new feature.

Do you think your median maybe is better than my old approach? Should I use the median from now on?

On Thu, Jun 24, 2021 at 4:33 PM Nikolay Markov @.***> wrote:

Sounds good. I'll use precomputed coordinates for the main label and will compute them in JS for other labels

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/221#issuecomment-867686263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TL6E4TFNT2SPH2N223TUM63BANCNFSM47AYOGZQ .

mxposed commented 3 years ago

Great, we are agreed!

Do you think your median maybe is better than my old approach? Should I use the median from now on?

I completely don't know. Probably it's worse than your approach, but it's relatively cheap O(nlogn), and just the mean was obviously bad, so I had to do something.

Let me keep the current things as they are and just add js labels for other metadata fields

maximilianh commented 3 years ago

Hi @mxposed, any news from this? I tried to merge it but it says that it's a "draft PR"... ?

mxposed commented 3 years ago

Hi @maximilianh , I haven't updated it still and haven't tested it on big data, I'll try to do that in August

mxposed commented 3 years ago

I tested this on 2 million cells on my laptop: it's 2-3 seconds for labels and ~8 seconds for drawing dots, however altogether it's taking ~15-20 seconds to change a color or something, not the 8 seconds. Do you have any idea?

maximilianh commented 3 years ago

Hi Nikolay, I was about to ask you about this... :-)

That sounds good to me. I was worried about the time it takes to calculate the labels and it seems that that's only 5-6 seconds even for 2 million cells. You only calculate them once, right, and cache them?

Yes, the drawing is super slow, ideally I'd rewrite the dot drawing code with three.js as WebGL particles.

Also, we still have the server-calculated position, most users will never experience any lag, as it would default to static label coords, until the user changes the field.

So you think I can merge this?

On Fri, Sep 3, 2021 at 2:25 AM Nikolay Markov @.***> wrote:

I tested this on 2 million cells on my laptop: it's 2-3 seconds for labels and ~8 seconds for drawing dots, however altogether it's taking ~15-20 seconds to change a color or something, not the 8 seconds. Do you have any idea?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/221#issuecomment-912155316, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TPOH6BCSMOFQG4XEM3UAAIWZANCNFSM47AYOGZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mxposed commented 3 years ago

Great point on caching—I don't cache them now.

Let me add caching code and use the default positions that are precalculated and then you can look at how it looks and I'll change it to a real pull request and you can merge. Will do that approximately over the weekend

mxposed commented 3 years ago

I sped up label coordinates computation, it's now ~800ms for 2 million cells on my laptop. I added caching and use the python precomputed coordinates for the default label field.

It's still unclear what to do about heatmap view.

Another thing is that I haven't added label field to URL.

I'll open pull request for review/merging, in case you want to finish it yourself. I can work on it too soon, please just let me know

maximilianh commented 3 years ago

I have to rethink the entire heatmap viewer. I am tempted to ditch it entirely and integrate Morpheus instead. Have you used Morpheus before? It's impressive. Let's ignore the heatmap viewer for now.

I won't be able to work on it too, seen it's grant submission time again this year, and the grant was rejected, so I'll have to go over the text and draft 4-5 support letters again. I'd love to work on the code!

On Sun, Sep 5, 2021 at 5:10 AM Nikolay Markov @.***> wrote:

I sped up label coordinates computation, it's now ~800ms for 2 million cells on my laptop. I added caching and use the python precomputed coordinates for the default label field.

It's still unclear what to do about heatmap view.

Another thing is that I haven't added label field to URL.

I'll open pull request for review/merging, in case you want to finish it yourself. I can work on it too soon, please just let me know

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/pull/221#issuecomment-913077689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TODOANAH5D4VFNKLHDUALNRNANCNFSM47AYOGZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

maximilianh commented 3 years ago

I'm merging this now into develop so Matt can look at it.

maximilianh commented 3 years ago

Hey @mxposed, feel free to commit the labelField onto this or I can add it later as soon as I'm done with the stupid grant.