openproblems-bio / openproblems

Formalizing and benchmarking open problems in single-cell genomics
MIT License
315 stars 79 forks source link

make celltype clusters tighter in celltype random embedding/graph #709

Closed scottgigante-immunai closed 1 year ago

scottgigante-immunai commented 1 year ago

In immune_batch, the celltype random graph does not perform well on isolated labels F1, which is odd since it was designed for this task. The only explanation here is that louvain is combining clusters, which means the random noise is too large and thus clusters can be merged despite their clear separation. This should make the clusters tighter and thus less likely to be merged.

scottgigante-immunai commented 1 year ago

Tests passing at https://tower.nf/orgs/openproblems-bio/workspaces/openproblems-bio/watch/5PBUZrwQk4cnI5

rcannood commented 1 year ago

LGTM. I'm not sure whether this test will achieve what we're trying to achieve :P

LuckyMD commented 1 year ago

The other option is to reduce the k in the knn graph that is computed. Isolated labels might be quite rare cells and thus be connected to the rest via the knn graph.

scottgigante-immunai commented 1 year ago

The reason I think this is not the cause is that other methods are doing much better, whereas if it was simply a problem with k then it would be unsolvable. Let's give this a try and see what happens.

On Mon, 28 Nov 2022, 7:26 am MalteDLuecken, @.***> wrote:

@.**** approved this pull request.

Option 2:

  • reduce k as these might be very rare cells with n_cells < k?

— Reply to this email directly, view it on GitHub https://protect.checkpoint.com/v2/___https://github.com/openproblems-bio/openproblems/pull/709%23pullrequestreview-1195717812___.YzJlOmltbXVuYWk6YzpnOmE4ZjhiOTUzNDhlMWMyYzUxNDczNmZiZmM0NzhmYTA0OjY6MWE5YTpkZDBmZTg4MzEyOTM1NGI3ODI0NDI5OWQxZmQwMzNlMWQ5NDViOWEyZjExMTViOWQ3MmZiN2EzZjg5Zjc1MmNiOmg6VA, or unsubscribe https://protect.checkpoint.com/v2/___https://github.com/notifications/unsubscribe-auth/AUHCMAV5BOW3475DPS37WM3WKSQENANCNFSM6AAAAAASMFBYDM___.YzJlOmltbXVuYWk6YzpnOmE4ZjhiOTUzNDhlMWMyYzUxNDczNmZiZmM0NzhmYTA0OjY6ZmYwZjplZDg2ZTAxODY4ZDJkMGI4MmVlZjNlYTU1ZWZkOWMxZjZkMjY5MGVhNWExOWM0Y2U4MTYzZGFiNmRiZjgwN2Q2Omg6VA . You are receiving this because you authored the thread.Message ID: @.***>

-- PLEASE NOTE: The information contained in this message is privileged and confidential, and is intended only for the use of the individual to whom it is addressed and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, or if any problems occur with the transmission, please contact the sender.

codecov[bot] commented 1 year ago

Codecov Report

Base: 95.06% // Head: 95.06% // Increases project coverage by +0.00% :tada:

Coverage data is based on head (cf947af) compared to base (a796e02). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #709 +/- ## ======================================= Coverage 95.06% 95.06% ======================================= Files 154 154 Lines 4072 4075 +3 Branches 206 206 ======================================= + Hits 3871 3874 +3 Misses 131 131 Partials 70 70 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `95.06% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openproblems-bio#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/openproblems-bio/openproblems/pull/709?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openproblems-bio) | Coverage Δ | | |---|---|---| | [...ems/tasks/denoising/datasets/tabula\_muris\_senis.py](https://codecov.io/gh/openproblems-bio/openproblems/pull/709/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openproblems-bio#diff-b3BlbnByb2JsZW1zL3Rhc2tzL2Rlbm9pc2luZy9kYXRhc2V0cy90YWJ1bGFfbXVyaXNfc2VuaXMucHk=) | `100.00% <ø> (ø)` | | | [...ration/batch\_integration\_graph/methods/baseline.py](https://codecov.io/gh/openproblems-bio/openproblems/pull/709/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openproblems-bio#diff-b3BlbnByb2JsZW1zL3Rhc2tzL19iYXRjaF9pbnRlZ3JhdGlvbi9iYXRjaF9pbnRlZ3JhdGlvbl9ncmFwaC9tZXRob2RzL2Jhc2VsaW5lLnB5) | `100.00% <100.00%> (ø)` | | | [...sks/spatial\_decomposition/methods/cell2location.py](https://codecov.io/gh/openproblems-bio/openproblems/pull/709/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openproblems-bio#diff-b3BlbnByb2JsZW1zL3Rhc2tzL3NwYXRpYWxfZGVjb21wb3NpdGlvbi9tZXRob2RzL2NlbGwybG9jYXRpb24ucHk=) | `96.77% <100.00%> (+0.16%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openproblems-bio). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openproblems-bio)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

scottgigante-immunai commented 1 year ago

@danielStrobl this won't fix isolated labels silhouette (pancreas). It should score perfectly but combat is 4x better. Can you please look into this?