rawmarshmellows / FacialClusteringPipeline

28 stars 9 forks source link

Speed? #1

Closed vivoutlaw closed 5 years ago

vivoutlaw commented 6 years ago

Hi Kevin,

I read your post (https://stackoverflow.com/questions/43462035/implementing-an-efficient-graph-data-structure-for-maintaining-cluster-distances) in which you talk about why your rank order distance has computational-speed issues. Were you able to fix it, or the algorithm still suffers from this issue? Thanks and looking forward to your reply.

best, Vivek

rawmarshmellows commented 6 years ago

Hey Vivek, I was unable to fix this, and I have stopped working on this project though I did have some notes on how to make it faster... but I'm not sure if it works though it may help you

  1. Using cluster normalized distance and rank-order distance create edge between 2 clusters if condition is true
  2. When merging faces give face an ID which is the same as the ID as their cluster
  3. Let each cluster have a dictionary that stores the nearest distance from itself to other clusters whereby the key is the cluster ID e.g. assuming this cluster’s id is cluster2_id
    1. {cluster1_id : distance, cluster3_id : distance ... }
  4. Then for each face in the cluster and go through the top 20 neighbours and update the distance dic in their clusters
  5. To optimize, if a face has all of it’s neighbours to be in the same cluster, then flag it to be ignored in next nearest distance evaluation
  6. Sort the dictionary by ascending value to see each cluster’s nearest neighbours.
vivoutlaw commented 6 years ago

hey Kevin, thanks! I rather used openbr they have the implementation of Rank-Order algo. :) best, Vivek

vivoutlaw commented 6 years ago

In case, if I implemented it someday, I'll update you about that! Thanks anyways, again! :)

rawmarshmellows commented 6 years ago

@vivoutlaw did you end up using my pipeline with the Rank-Order algo? :) if so, could you share the results with me, I'm quite interested

vivoutlaw commented 6 years ago

Hi @kevinlu1211, in the end I used the rank-order implementation available in the openbr library https://github.com/biometrics/openbr/blob/c3ea310daaa7959b7be5cc9f127d5fd41728ae69/openbr/core/cluster.h), and for large scale datasets I used Approx. RO available here: https://github.com/gmy001/Clustering

For LFW (with 13233 samples), RO (agressiveness=14) resulted to 1509 clusters with 87.63% acc, and 67.4822 Fscore. Approx RO (Threshold=1.1) resulted to 9920 clusters with 93.18 acc and 88.31 Fscores.

rawmarshmellows commented 5 years ago

This is great thanks for the references!