niksart / som-fMRI

0 stars 0 forks source link

Binary Distance Measurement #2

Open affiqazrin opened 4 years ago

affiqazrin commented 4 years ago

Hi. I have a binary dataset consists of 2k observations and 44 variables. Is there any other good distance metric other than Euclidean to measure distance? Jaccard and Tanimoto have been proved to measure distance between vectors. Can Pearson in your som-fMRI be used for my case?

niksart commented 4 years ago

Hello @affiqazrin, what's the nature of this data? Are the 2k observations time series?

Edit: btw, note that the Pearson implementation of minisom is here, and not in this repo. Look at the last two commits. So maybe is better to discuss this in an issue in that repo. Feel free to open one.

niksart commented 4 years ago

Sorry, I realized that there are not issues in forks... :)

affiqazrin commented 4 years ago

Thank you for reopening this issue @niksart :)

The 2k of observations are not time series. (A well known data set of Animals from (Ritter and Kohonen 1989) is very similar to mine. I am considering to use Pearson in your som-fMRI for this case, or is there any other possible distance measure?

Here I attach references to give you the idea on suitable implementation of binary measures.

  1. https://www.semanticscholar.org/paper/Binary-based-similarity-measures-for-categorical-in-Louren%C3%A7o-Lobo/ab2d3ddd4f62104f716501d9b633b24b24b9f73a
  2. https://stackoverflow.com/questions/52326174/how-to-calculate-correlation-between-binary-variables-in-python
  3. https://stats.stackexchange.com/questions/103801/is-it-meaningful-to-calculate-pearson-or-spearman-correlation-between-two-boolea
niksart commented 4 years ago

Hello, here's a review of possible distances: link

I think that an hamming distance would perform identically to an euclidean distance, look at their formulas in the linked paper. For the moment I would have a try with a simple euclidean distance, and maybe then try Jaccard distance or something else.

Edit: I think that also Pearson of course can be an option. But I don't think you need to necessarily use it