Closed tdrose closed 10 months ago
Previous note to myself:
Current conclusion: Without dataset specific bounds, it seems to perform better.
- coloc on technical replicates leave out features/test robustness
- Leave out ions in datasets and test how well coloc in one/between datasets in preserved
- E.g. for cosine jus the mean coloc (how is it influenced/similar to the same coloc in other datasets
- In DL methods check how much the latent space is changing
- Biological replicates evaluate incluence of overlap
- Different datasets
Evaluate how well different approaches can deal with it. From simple cosine, over tf-idf, PLSA, to deep learning. Look at transitive closure. Compare to prior knowledge networks
Might be helpful for making a median filter in pytorch: https://gist.github.com/rwightman/f2d3849281624be7c0f11c85c87c1598
As a baseline, I could compare fully trained models (without testing data), to check the stability and generalization of the models. It probably also makes sense to work with ranking based metrics since (as already mentioned above), colocs will have different meanings in the embedding spaces.
See issue #5 for all evaluation scenarios.
We can also compare with other large pretrained models, such as CLIP, DINOv2 or BiomedCLIP. A good guide can be found here: https://medium.com/aimonks/clip-vs-dinov2-in-image-similarity-6fa5aa7ed8c6
Approach evaluation differently. Just top coloc pairs per dataset is not what we are interested in.
Better look at individual neighbors: Top-1 and Top-5 Accuracy for each molecule on the training set, for same ions and approximating for test data using mean/median and inferring on test data.
Implemented new evaluation based on Top-n accuracy for most colocalized ion.