jerinphilip / ocr-retrain

0 stars 0 forks source link

Clustering #14

Open jerinphilip opened 7 years ago

jerinphilip commented 7 years ago

x = img | prediction | (img, prediction)                                
i = index of occurunce                                                  

X = [(x, i)]                                                            

d = lambda (x_1, i_1), (x_2, i_2): some-distance(x_1, x2)               

# cluster : predictions, distance-metric -> [[predictions]]             
def cluster(X, d, method):                                              

def MSTCluster(X, d):                                                    
# https://github.com/jakevdp/mst_clustering/blob/master/mst_clustering/_mst_clustering.py

def LSHCluster(X, d):                                                    
jerinphilip commented 6 years ago

Start with praveen sir's code and pipe it to the cluster function, already written by now.

Use the function here to get features and connect it to cluster with a euclidean distance criteria. Manually try parameters threshold and prune for now and see if anything works okay.

The features are in /OCRData2/praveen-intermediate/<book-id>/feats.npy, for the respective book. To test the function, you can simply run

python3 -m doctools.hwnet.codes.deploy.feature -f /OCRData2/praveen-intermediate/0061/feats.npy

@Deepayan137

The glue code we'll reuse, hyperparameter tuning, we'll do later.