serengil / tensorflow-101

TensorFlow 101: Introduction to Deep Learning
https://www.youtube.com/watch?v=YjYIMs5ZOfc&list=PLsS_1RYmYQQGxpKV44jsxXNgjEpRoW61w&index=2
MIT License
1.06k stars 636 forks source link

Voyager Face embedding storing #32

Closed Raghucharan16 closed 7 months ago

Raghucharan16 commented 7 months ago

what exactly is this piece of code doing

for i in range(len(embeddings), target_size):
    embedding = np.random.uniform(-5, +5, num_dimensions)
    embeddings.append(embedding)
    img_names.append(f'synthetic_{i}.jpg')
print(f'There are {len(embeddings)} embeddings available')

and can i just add my own faces embeddings without creating synthetic data? if so how can i do that? Thank you.

serengil commented 7 months ago

where did you get this?

serengil commented 7 months ago

it is for adding synthetic data. i wanted to test some ann algorithms on very large data. of course you do not have to have that block. working with just real data is better.

Raghucharan16 commented 7 months ago

on your blog how can i do that, like how can add embedding with n_dimension parameter? and also while running it in vs code it is not showing the result picture? I had this code,

# built-in dependencies
import os
import time

# third-party dependencies
import numpy as np
import cv2
import matplotlib.pyplot as plt
from deepface import DeepFace
from voyager import Index,Space
model_name = 'Facenet'
detector_backend = 'mtcnn'
num_dimensions = 128 # Facenet produces 128-dimensional vectors 

img_names = []
embeddings = []

for dirpath, dirnames, filenames in os.walk('dbmod'):
    for filename in filenames:
        if '.jpg' in filename:
            try:

                img_name = f'{dirpath}{filename}'

                embedding_objs = DeepFace.represent(
                    img_name, model_name=model_name, detector_backend=detector_backend
                )
                embedding = embedding_objs[0]['embedding']
                embedding=embedding,num_dimensions
                embeddings.append(embedding)
                img_names.append(img_name)
            except Exception as e:
                pass
# target_size = 10000
# for i in range(len(embeddings), target_size):
#     embedding = np.random.uniform(-5, +5, num_dimensions)
#     embeddings.append(embedding)
#     img_names.append(f'synthetic_{i}.jpg')
print(f'There are {len(embeddings)} embeddings available')
index = Index(Space.Euclidean, num_dimensions=num_dimensions)
embeddings_np = np.array(embeddings)
tic = time.time()

index.add_items(embeddings_np)

toc = time.time()

print(
    f'{embeddings_np.shape[0]} embeddings are stored in voyager in '
    f'{round(toc-tic, 2)} seconds'
)
target_img = 'sample.jpg'
embedding_obj = DeepFace.represent(
    target_img, model_name=model_name, detector_backend=detector_backend
)
target_embedding = embedding_obj[0]['embedding']
tic = time.time()

neighbors, distances = index.query(target_embedding, k=3)

toc = time.time()

print(
    f'Index search completed in {toc-tic} seconds among '
    f'{embeddings_np.shape[0]} vectors'
)
target_img = cv2.imread('Madhursample.jpg')

for i, neighbor in enumerate(neighbors):
    img_name = img_names[neighbor]
    label = img_name.split('/')[-1]
    distance = distances[i]
    print(
        f'{i+1}. nearest neighbor is {label} with distance {round(distance)}'
    )

I'm getting this output, and [error]

There are 0 embeddings available
Traceback (most recent call last):
  File "/home/narravenkataraghucharan/Desktop/ufacedetection/face_voyager.py", line 87, in <module>
    index.add_items(embeddings_np)
ValueError: Input array was expected to have rank 2, but had rank 1.
Raghucharan16 commented 7 months ago

And for me 110 face embeddings are taking more than a minute for storing in voyager. but the search was fast though. could you check what went wrong?? this is the code:

import os
import time
import logging
import numpy as np
import cv2
from deepface import DeepFace
from voyager import Index, Space

model_name = 'Facenet'
detector_backend = 'mtcnn'
num_dimensions = 128  # Facenet produces 128-dimensional vectors 

img_names = []
embeddings = []

for dirpath, dirnames, filenames in os.walk('dbmod'):
    for filename in filenames:
        if '.jpg' in filename:
            try:
                img_name = os.path.join(dirpath, filename)

                # Generate embedding
                embedding_objs = DeepFace.represent(img_name, model_name=model_name, detector_backend=detector_backend)
                embedding = embedding_objs[0]['embedding']
                logging.debug(f"Successfully generated embedding for {img_name}")

                # Append to lists
                embeddings.append(embedding)
                img_names.append(img_name)
            except Exception as e:
                logging.error(f"Error generating embedding for {img_name}: {e}")
                pass

# Print number of embeddings
print(f'There are {len(embeddings)} embeddings available')

# Initialize Voyager index
index = Index(Space.Euclidean, num_dimensions=num_dimensions)

# Add embeddings to index
embeddings_np = np.array(embeddings)
index.add_items(embeddings_np)

# Process target image
target_img = 'sample.jpg'
embedding_obj = DeepFace.represent(target_img, model_name=model_name, detector_backend=detector_backend)
target_embedding = embedding_obj[0]['embedding']

# Perform index search
neighbors, distances = index.query(target_embedding, k=1)

# Print results
print(f'Index search completed among {embeddings_np.shape[0]} vectors')

# Display nearest neighbors
for i, neighbor in enumerate(neighbors):
    img_name = img_names[neighbor]
    label = img_name.split('/')[-1]
    distance = distances[i]
    print(f'{i+1}. Nearest neighbor is with distance {round(distance)}')
serengil commented 7 months ago

Nothing! Creating index takes time but it offers fast search.

Raghucharan16 commented 7 months ago

so this can't be faster than this?? like for mere 100 images it is taking 1 min to store?

serengil commented 7 months ago

if you have 100 images, then you should not use an index method. deepface's find function performs better.

index methods should be adopted if you have 1M+ samples.

Raghucharan16 commented 7 months ago

yes, indeed deepface's find function is much faster but for my data, it is not giving accurate results

Raghucharan16 commented 6 months ago

Hey @serengil I have one small task to do, would you give me a hand if possible, The task is I have multiple folders containing faces in them, say folder1 has A,B,C,D faces and folder2 have A,D,E,F faces now my task is to iterate the 2 folders [basically there will be more] and save the unique faces in another folder say unique_faces_folder. what i'm doing is before adding a face, i'm verifying it through deepface's verify method and also tried the find method on [uniwue_faces-folder but i'm getting false positives. and with verify method it is taking too much time. what would be suggested way to improve and solve the use case. i'm using yolov9 for face detection. tried voyage and annoy too for first nearest neighour but those are giving mixed results.

serengil commented 6 months ago

the best way to do that is to use verify function - it will take some time

Raghucharan16 commented 6 months ago

Yeah verify gave me better results but taking some time. why can't we get same results with find function as it is very fast? compared to iterative checking.

serengil commented 6 months ago

we discussed this yesterday, verify and find are doing same, find stores its outcomes in a pickle file to restore later.

Raghucharan16 commented 6 months ago

yeah we discussed about it. but for me results are not same. hoping the insight face's buffalo_l model will give better results. thanks for your patience and we appreciate your work.

darkar18 commented 6 months ago

Hey just my opinion Vector store like Milvus can give you dynamic indexing and storing,need not build everytime. + they have searching and indexing params you can configure. checkout Milvusdb

Raghucharan16 commented 6 months ago

@darkar18 thanks for suggestion i'll look into it.