Closed gjhuizing closed 2 months ago
Hi @gjhuizing,
I would create a new task with a similar name. We can then make the folder structure match the subtask definition.
A few considerations here:
Hi @LuckyMD, thanks for your answer. I'm creating a new task then!
The quality of the annotations is critical indeed. If I'm not mistaken, cell lines offer a perfect ground truth and are often used to benchmark integration methods. However they are a bit too easy of a problem so if there are more realistic datasets out there with a reliable ground truth that could be good.
this is exactly my thought as well ;). Good start for sure with cell lines though!
Indeed! Ideally we should be able to have more than two omics, and to specify which omics they are to the method
If you do this, you will require every method to be applicable to every omics layer. I would recommend making the task more specific first, and then think about sharing methods between further sibling tasks.
I'm not that familiar with methods that output graphs rather than embeddings, but I suppose we could have some common metrics. Maybe the silhouette score can be computed on graphs (through geodesic distance?). What I wanted to avoid is to evaluate the clustering part, in order to focus on the quality of integration. But maybe that's not the best approach
We have a preprint on data integration benchmarking dealing with a lot of these issues for a single modality (https://www.biorxiv.org/content/10.1101/2020.05.22.111161v2). These metrics are all being added in the PR I linked above. Maybe it would be good to chat about this in a voice channel in batch integration?
definitely, sending you a message on discord
This issue has been automatically closed because it has not had recent activity.
Describe the problem concisely. The current multimodal integration task focuses on alignment of different datasets, profiled from different cells. It is tested by separating data coming from the same cell and evaluating whether the alignment corresponds to the true cell matches.
However some new methods perform integration in a different way. They take into account both modalities and map the cells to a latent space, where they can be visualized and clustered.
Propose datasets Any multi-omics dataset with cell type / cell line annotations.
Propose methods
Propose metrics Given the original labels (cell types/cell lines):
Question If I want to implement this, should i create a new task with a different name (something like same-cell-multiomics-integration ?), or create a subtask? i didn't find much documentation on subtasks, except the last 20s of the video tutorial on creating tasks, so I'm not sure how to do that