Baseline model and evaluation

tdrose / ionimage_embedding

DL based representation learning of MS imaging data

GNU General Public License v3.0

0 stars 0 forks source link

Baseline model and evaluation #7

Closed tdrose closed 10 months ago

tdrose commented 1 year ago

We need a baseline model for the evaluation (e.g. just basic cosine-based colocalization).
Evaluation protocoll
- Coloc and the model output operate in very different spaces, we need a comparable metric for their results
  - Check output distributions, and ranges of outputs, ...
- What can we use as ground truth?
  - Probably the true colocalization between molecules
  - If molecules have never been observed together, we might use prior knowledge metabolic networks
  - Simulated data?

tdrose commented 1 year ago

Previous note to myself:

Current conclusion: Without dataset specific bounds, it seems to perform better.

coloc on technical replicates leave out features/test robustness

Leave out ions in datasets and test how well coloc in one/between datasets in preserved

E.g. for cosine jus the mean coloc (how is it influenced/similar to the same coloc in other datasets

In DL methods check how much the latent space is changing

Biological replicates evaluate incluence of overlap

Different datasets

Evaluate how well different approaches can deal with it. From simple cosine, over tf-idf, PLSA, to deep learning. Look at transitive closure. Compare to prior knowledge networks

tdrose commented 1 year ago

Might be helpful for making a median filter in pytorch: https://gist.github.com/rwightman/f2d3849281624be7c0f11c85c87c1598

tdrose commented 1 year ago

As a baseline, I could compare fully trained models (without testing data), to check the stability and generalization of the models. It probably also makes sense to work with ranking based metrics since (as already mentioned above), colocs will have different meanings in the embedding spaces.

tdrose commented 1 year ago

See issue #5 for all evaluation scenarios.

tdrose commented 1 year ago

We can also compare with other large pretrained models, such as CLIP, DINOv2 or BiomedCLIP. A good guide can be found here: https://medium.com/aimonks/clip-vs-dinov2-in-image-similarity-6fa5aa7ed8c6

tdrose commented 11 months ago

Approach evaluation differently. Just top coloc pairs per dataset is not what we are interested in.

Better look at individual neighbors: Top-1 and Top-5 Accuracy for each molecule on the training set, for same ions and approximating for test data using mean/median and inferring on test data.

tdrose commented 10 months ago

Implemented new evaluation based on Top-n accuracy for most colocalized ion.