Closed diyoyo closed 4 months ago
Are the background jobs still running?
Are the background jobs still running?
I haven't checked from within the containers whether a process is hanging, all I can say is that I always wait for all jobs to be at 0 to start my "tagging sessions".
Why is a face unassigned in the first place?
If it's unassigned, it means it a) has fewer similar faces than the min recognized faces setting and b) none of the faces it matched are associated with a person either.
Why are some faces detected afterwards that were not detected the first time?
Facial recognition happens in two phases. The first phase is meant to be fast and the second phase runs at night to make it more complete. Running "missing" essentially ran the second phase. I can see why this would be confusing after you just set things up; there's room for improvement here.
Edit: this doesn't apply as much if you run All for facial recognition. It should be close to finalized in this case, with maybe a few more faces that could get recognized in a Missing job.
Are there some threshold that I meet only after I reduced the number of unknown people?
No, there's nothing like this.
If it's unassigned, it means it a) has fewer similar faces than the min recognized faces setting and b) none of the faces it matched are associated with a person either.
Thanks for the explanation @mertalev . Then I don't get why these faces had been unassigned, since my min recognized faces is 1.
Let's take the example of yearly classroom pictures of my mom's childhood. The lighting is good and everyone shows up with kinda "equal opportunity" to get their face detected, since they're facing the camera and are all still. Well, I started hiding all the other classmates, since my mom had not been detected. Basically I was hoping that she would get detected during the "Missing" run. I don't know for how long she actually had been "unassigned", but her cluster is pretty well populated so I don't see any reason for the recognition to miss her...
I was starting to build the hypothesis that if too many people are hidden on a picture, then the leftovers would be "unassigned" as well, but your reply seems to discard this hypothesis.
Clustering depends on a ton of variables. In this case it'd depend on how many photos of her child self are in the library, how many of those were associated with her, what the lighting, angles, colors and resolutions were for those faces, etc. All of that can affect how similar it thinks a face is, and if it's further than the distance threshold it won't be a match.
Regardless of this, given your previous reply, how can a face be unsassigned if my min recognized face is 1 ? It should just be considered as yet another unnamed people, shouldn't it?
Was it set to 1 from the start? Or did you change it to 1 afterwards? It should make a person for her as long as facial recognition ran with the new setting.
Looking at the code, I have a hunch for what might have happened in this case.
First can you share the output of docker exec immich_postgres psql -U postgres -c "SELECT * FROM pg_vector_index_stat;" immich
?
Was it set to 1 from the start? Or did you change it to 1 afterwards? It should make a person for her as long as facial recognition ran with the new setting.
That might be an explanation. There has been so many changes in immich over the past 10 months, and I have done so many more on the organisation of my library...
I am pretty sure the min=1 was never a default option, so you're right, the likelihood that I have changed it after some faces were unassigned already is high. Then the likelihood that the web UI for unassigned faces was not yet implemented is also high, and it would explain why I missed this icon in the first place, in the info pane.
I'll share with you the output of the sql query in a minute, my RPi is hanging pulling the latest docker image.
docker exec immich_postgres psql -U postgres -c "SELECT * FROM pg_vector_index_stat;" immich
tablerelid | indexrelid | tablename | indexname | idx_status | idx_indexing | idx_tuples | idx_sealed | idx_growing | idx_write | idx_size | idx_options
------------+------------+--------------+------------+------------+--------------+------------+------------+-------------+-----------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
143885 | 1133821 | smart_search | clip_index | NORMAL | f | 200555 | {200555} | {} | 0 | 439014872 | {"vector":{"dimensions":512,"distance":"Cos","kind":"F32"},"segment":{"max_growing_segment_size":20000,"max_sealed_segment_size":1000000},"optimizing":{"sealing_secs":60,"sealing_size":1,"optimizing_threads":2},"indexing":{"hnsw":{"m":16,"ef_construction":300,"quantization":{"trivial":{}}}}}
16913 | 1420991 | asset_faces | face_index | NORMAL | t | 353293 | {353270} | {1,22} | 0 | 771715520 | {"vector":{"dimensions":512,"distance":"Cos","kind":"F32"},"segment":{"max_growing_segment_size":20000,"max_sealed_segment_size":1000000},"optimizing":{"sealing_secs":60,"sealing_size":1,"optimizing_threads":2},"indexing":{"hnsw":{"m":16,"ef_construction":300,"quantization":{"trivial":{}}}}}
Also, is there a way to un-unassign all unassigned faces? I looked for a field in the "person" table, but could only find isHidden
. And I couldn't find anything either in the asset_faces
's fields.
Thanks.
To clarify, it should still apply the new setting on unassigned faces when you run a Missing job and for new assets (and of course if you re-ran Facial Recognition on all assets).
My hypothesis is that the vector index made a boo-boo and didn't return any results for a search (i.e. it didn't match the face against itself, which should never happen unless the index misses it). The code doesn't handle this case so it checks if 0 matches is greater than or equal to 1 (the minimum face setting) and decides the face should be unassigned instead of creating a new person.
Try running REINDEX INDEX face_index;
, wait for it to finish (check the pg_vector_index_stat
query that idx_indexing
is f
or just check cpu usage) and run a Missing job. If I'm right, this will probably change the result. If not, dropping the index would certainly confirm whether it's an index issue but the facial recognition would take longer.
Ok, I will try in a moment. There are two points I'd like to raise first:
Do you think it could have affected this issue ? All I know so far, is that it messed up the metadata of some files: the bounding box of some faces is correct in the webUI, but the thumbnail is taken from the reversed aspect ratio, so you don't see the real face for unnamed people, and the small pic preview for that person is also with a reversed aspect ratio. Yet the max-size pic preview has a correct aspect ratio. Anyway, that's for another github issue.
this will probably change the result
? As we speak, the results are changing every time I run "Missing", with new people always getting discovered. Should we agree on some metrics before I run the test?So far, my favorite tracker is the following:
\out 240613_beforeScreening.txt
SELECT p."name", COUNT(1) as c
FROM person p
INNER JOIN asset_faces af
ON af."personId"=p."id"
WHERE p."name"<>''
GROUP BY p."name"
ORDER BY c DESC, p."name";
\out
and then
colordiff -u 240613_beforeScreening.txt 240613_afterScreening.txt
And if I want to get a number that closely matches the "Total number of people" of the webUI, I run:
SELECT COUNT(1) FROM
(
SELECT p."id", COUNT(1) as c
FROM person p
INNER JOIN asset_faces af
ON af."personId"=p."id"
WHERE p."isHidden"='f'
AND p."thumbnailPath"<>''
GROUP BY p."id"
ORDER BY c DESC
) as q
WHERE q."c" > 0;
But surprisingly, I was not able to match exactly the number that is displayed in the UI (and have been too lazy reading the source code at the moment)
Do you think it could have affected this issue?
It depends on the query, but it certainly could.
Should we agree on some metrics before I run the test?
The ones you shared are fine. Another one that would be nice to look at it is grouping by person and number of faces. Something like
SELECT af.”personId”, COUNT(*) face_count
FROM asset_faces af
GROUP BY af.”personId”
ORDER BY face_count DESC;
That should include unassigned faces too (in the form of null).
Ok, then I will mesure this as well before and after.
So, just to be clear and maximise the value of the experiment, I will:
Correct?
Yup, but no need to run it for face detection.
Alright. Well, there has been a lot of changes. Basically the range is way bigger than in the past.
Some more context Historically, two weeks ago, I did a big tagging/hiding session, went from 10k people to less than 6k. After a new facial recognition, it went back up to 20k. That was the biggest jump. But as I kept reducing the available pool of people, of course the jump was lower and lower. This morning, I reached the 4000 faces milestone, and it went up 4295 before we had this conversation. So, small jumps, and very few edits to the number of pics per tagged person.
Reindexing Results
cat 240613_beforeScreening_2218.txt | wc -l
→ 1378
(yes, I know, that's a lot, but I like to tag famous people that are detected on the newspapers or the tv screens 🤣🤣)diff 240613_beforeScreening_2218.txt 240613_afterScreening_2317.txt | grep "@@" | wc -l
→ 44 (with pretty big change blocks at the top of the pyramid)
diff 240613_beforeScreening_count_2219.txt 240613_afterScreening_count_2318.txt
count
-------
- 4295
+ 8673
(1 row)
And from the web UI: from 4265
to 8643
(I don't know why there is this 30-people diff)
cat 240613_beforeScreening_withUnassigned_2220.txt | wc -l
→ 22133
cat 240613_afterScreening_withUnassigned_2318.txt | wc -l
→ 26517
diff 240613_beforeScreening_withUnassigned_2220.txt 240613_afterScreening_withUnassigned_2318.txt | grep "@@" | wc -l
→ 216
My main concerns are:
EDIT: diff
is an alias for colordiff -u
Another query which I use to focus first on the pics that needs the most hiding is the following:
SELECT a."originalPath", COUNT(1) as c FROM assets a
INNER JOIN asset_faces af
ON af."assetId"=a."id"
INNER JOIN person p
ON p."id"=af."personId"
WHERE p."isHidden"='f' AND p."name"='' GROUP BY a."originalPath"
ORDER BY c DESC
LIMIT 50;
Before the reindexing, the pic with the biggest amount of unnamed people had about 24 and the count fell down to 14 afterward, and quickly to 4. Now, the max number is 31 and I'm still at 9 after reaching the LIMIT 50.
So the number of recognized people doubled? Wow, that's actually pretty remarkable. Did that photo of your mom get recognized now? Also, can you re-run the pg_vector_index_stat
query?
I had fixed that picture of my mom already this morning, after discovering that the unassigned icon was actually hiding pretty interesting faces.
After the reindexing ended (I ran it directly in a bash session, so it was a blocking action), it still took a little bit of time before the f
value appeared for face_index
Now, since my previous message, I already started hiding again the pics with most unnamed people, and I'm happy because it really seems like everyone is recognized on the picture. So maybe you solved it by making me REINDEX. Maybe my tagging sessions were forcing the reindexing of only a selected set of pics, which would explain why it would find more people afterwards on these pics, but not all pics?
The result of the pg_vector_index_stat
is :
tablerelid | indexrelid | tablename | indexname | idx_status | idx_indexing | idx_tuples | idx_sealed | idx_growing | idx_write | idx_size | idx_options
------------+------------+--------------+------------+------------+--------------+------------+------------+-------------+-----------+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
143885 | 1133821 | smart_search | clip_index | NORMAL | f | 200557 | {200557} | {} | 0 | 439029072 | {"vector":{"dimensions":512,"distance":"Cos","kind":"F32"},"segment":{"max_growing_segment_size":20000,"max_sealed_segment_size":1000000},"optimizing":{"sealing_secs":60,"sealing_size":1,"optimizing_threads":2},"indexing":{"hnsw":{"m":16,"ef_construction":300,"quantization":{"trivial":{}}}}}
16913 | 1420991 | asset_faces | face_index | NORMAL | f | 151468 | {151468} | {} | 0 | 339634152 | {"vector":{"dimensions":512,"distance":"Cos","kind":"F32"},"segment":{"max_growing_segment_size":20000,"max_sealed_segment_size":1000000},"optimizing":{"sealing_secs":60,"sealing_size":1,"optimizing_threads":2},"indexing":{"hnsw":{"m":16,"ef_construction":300,"quantization":{"trivial":{}}}}}
(2 rows)```
Could you please explain what value you're looking at in this `pg_vector_index_stat` query? I'd like to understand more.
Good news:
Bad news:
Could you please explain what value you're looking at in this
pg_vector_index_stat
query? I'd like to understand more.
The idx_tuples
field is the most interesting. Because of how Postgres handles concurrency, changing a row actually creates another copy of that row and inserts it again into each index.
Before you reindexed, you had over twice as many rows in the vector index as you do now, likely because of assigning and reassigning people. Since more than half of the whole index was duplicates, this damaged the structure of the graph and contributed to lower recall.
just found out that there are still unassigned faces on some pics, which I still don't understand knowing that min_cluster_size=1.
For more science, you can try dropping the index with DROP INDEX face_index
and re-running. This will give you perfect, exact recall.
Thanks for everything. I don't know whether at the end of the day, this qualifies as a bug or not, but in the meantime, i'll reindex now and then, and hopefully I won't become crazy after each big jump in the number of persons left to process.
No problem! If you do try to run it without an index, do let me know the results. The difference will show the quality of the index in general. If there’s a big difference, it might be worth tuning the index settings for higher quality.
I won't try it soon. This really was exhausting, I need a rest. But I'll let you know if I do.
Just so you know, the cron job started and surprisingly (well, you might say it's the normal behavior), while the Library Tasks became green, couting down to 0, the "Facial Detection" and "Facial Recognition" never started (or so quickly that I didnot notice). In the past two weeks, I would systematically see them at work at cron time.
So, in addition to having my closest people back on track in the top list of people to tag (yep, another positive side effect of the reindexing), you might also have prevented my low-cost SSD from dying too soon 👍
@mertalev It's a bit out-of-topic but still related, and I'm not sure it deserves a new issue, so I'll post it here. Now that the number of faces does not increase all the time, I'm still surprised that no clustering is actually happening anymore. My goal was to display all the faces so I could balance the sizes of the clusters as much as possible, so it would weigh more during the clustering and finally attract some people that are less represented. (I'm not sure this is 'clear', so I'll rephrase: I wanted to manually pick pictures of less-represented people to grow clusters and give the ML algo more data for the smaller classes). But it feels like I'll have to cherry pick all these rare pics, as I don't see any recognition really happening.
I'm not a complotist, but I easily make hypotheses 🤣. Please debunk the following (and if possible, forward me to the piece of code that would help me understand) :
These questions may sound weird, but I'm really confused here. Sorry.
I think a cool feature would be the "Are these people the same?".
Or a field manualTag
in asset_faces
that would allow for preservation of the cluster when performing a full reclustering.
Or an option to recluster fully, except for named persons.
The Recognition job does not consider all possible "named" classes when going through all the unnamed faces. There is a threshold. True of False?
For each unassigned face, it searches for similar faces within the distance threshold. Among those matches, it will try to find a face with an associated person. The person of the most similar face with a person will be chosen when there are multiple matches.
or... the recognition job does not go through unnamed people, it goes through unclustered faces. Since my min=1, it means that if a face-bounding-box has not been attributed an existing+named cluster in the first place, it will create a new+unnamed cluster (ie person), but will never try to re-cluster that face ever again, since it already sits in a cluster of size 1. True or False?
The "missing" job will go through faces without an assigned person. If a face has a person assigned, then it won't be queued anymore. There's no point in queueing it since it already has a match. Other faces that haven't been recognized are still queued and can match against the person of that face if they're similar enough.
Ok, I'll interpret this as "False" and "True", then. True, meaning that when min-cluster-size=1, there should be no unassigned face after the first round. So a face in a cluster of 1 has no chance of being reclustered anymore.
Then, I'll start a new "Feature Request" after making my own tests :
Would the following be enough to trigger a re-classification (after hitting "Recognition : Missing") of the currently unnamed people with cluster size=1 ?
Should I edit the asset_job_status
too?
-- Start a transaction
BEGIN;
-- Select the personId to delete
WITH persons_to_delete AS (
SELECT p."id"
FROM person p
JOIN asset_faces af ON p."id" = af."personId"
WHERE p."name" = ''
GROUP BY p."id"
HAVING COUNT(af."id") = 1
)
-- Update the personId in asset_faces to an empty string
UPDATE asset_faces
SET "personId" = ''
WHERE "personId" IN (SELECT "id" FROM persons_to_delete);
-- Delete records in person
DELETE FROM person
WHERE "id" IN (SELECT "id" FROM persons_to_delete);
-- Commit the transaction
COMMIT;
Can I ask what you're trying to achieve with this first? Are you looking to re-run it with different settings or something?
Well, I used min-cluster-size=1 to be able to go pick rare faces of persons when they were younger etc. Now that I've done this, I'm pretty sure I missed some pics of the same person at the same age, given the number of pics that I have. Our conversation highlighted the fact that even if I manually grow the cluster of one given person, no re-clustering of these 1-face-clusters will happen, because the job is considered done by definition. So Now that I've done a lot of haystack-needle picking already, I just want to remove all those 1-face-clusters (more than 3500) that have no name, re-run the recognition and hope that some of them will fall within distance threshold of the labelled clusters.
To answer your question, I'm not changing the settings, I've already changed one parameter: the starting point, ie, the content of the labelled clusters, which will affect the distance calculation. I just need artifical help to finish the job and find the leftovers.
The queueing happens for faces, not people. Any faces that haven't been recognized will continue to be queued and can be matched with those 1-face people. There's nothing to gain from removing those people and re-running with the same settings. They wouldn't be in their own cluster if they could match those other clusters to begin with.
The only gain I suppose would be that the index should be cleaner now than before, so perhaps some of those faces could match existing people.
The queueing happens for faces, not people. Any faces that haven't been recognized will continue to be queued and can be matched with those 1-face people. There's nothing to gain from removing those people and re-running with the same settings.
I'm sorry I'm very confused. I understand one thing and the opposite when reading your answer. Please forgive my english, I guess it adds some confusion too.
Let's try again: You're saying "Any faces that haven't been recognized" ... well, if a singleton cluster exists, it means that the face associated with it has been recognized, and hence, labelled or not, it won't be queued anymore. So keeping the association between the face and the cluster, to my understanding, is what qualifies the face as "in the queue" or "not in the queue", right? Therefore the above SQL query, where I delete the link between a face and a singleton cluster, and, just to clean things up, delete the cluster itself.
Where have I lost myself?
They wouldn't be in their own cluster if they could match those other clusters to begin with.
Ok, I guess this is where my reasoning falls apart. I was assuming that the more diversity in the clusters that I've manually grown (from picking up singletons and merging them) would change the distance between the leftover-singletons and those manually-grown clusters.
But without knowing how the distance is calculated, and how the metric is derived from the components of the clusters, I guess I was just dreaming out loud :)
The distance stays the same, so if they didn't match something then it wouldn't "normally" change anything to re-run it. But the indexing aspect does make it likely that at least some of those single face clusters would end up matching something after all.
To answer your question, the queries look fine to me. The UPDATE asset_faces
is probably unnecessary since deleting the person will cascade that change to the asset_faces
table as well. I'd also make a backup first to be safe.
The
UPDATE asset_faces
is probably unnecessary since deleting the person will cascade that change to theasset_faces
table as well. I'd also make a backup first to be safe.
Yes, the UPDATE
was there out of lazyness (simpler than double-checking all the cascading is properly implemented 🤣, sorry 🙈)
The distance stays the same
This is where you need to educate me (maybe you did already and I missed the point): how come? why adding elements to a cluster doesn't affect the distance between the cluster and the outside faces?
The distance is always between individual faces. Clustering can affect which person a face gets assigned, not whether it will be assigned a person.
Ok, thanks for your patience. I understand now and I feel pretty stupid about it since I've done my own models in the past, just not about pics with a personal aspect. It's easier to abstract those concepts when you don't have the faces you recognize easily in front of you. I guess I proved "I'm not a bot".
I think it was mentioned on a different thread, that the dream would be to use the birth date and the pic timestamp to extrapolate the face (and generate extra embeddings?) at different ages. To add more complexity, one could guess the quality of the picture given the timestamp or some manual metadata (for scanned pics) and apply dynamic thresholds. Or a camera model classifier... Ok, too much already.
I think we would need a more robust testing environment for facial recognition before making those kinds of inferences. Making facial recognition better for one library is one thing, but it can backfire for other libraries if you aren't careful.
A small change that was mentioned that I agree with is to order images by date in descending order when queueing facial recognition. The idea is to guide it through the transition from childhood to adulthood. By ordering it, it can gradually expand to include faces of a different age instead of failing to recognize a face and creating a new person. Queueing in descending order (newest first) should be best since adult faces are more distinguishable.
So, at the end of the day, I ran the following query. the "isArchived" may have been the reason why I had this difference of 30 people compared to the number at the top of the People's tab. (I still have an offset of 3 though)
DELETE FROM person
WHERE "id" IN (
SELECT p."id"
FROM person p
INNER JOIN asset_faces af
ON af."personId"=p."id"
INNER JOIN assets ass
ON ass."id"=af."assetId"
WHERE p."isHidden"='f'
AND p."thumbnailPath"<>''
AND ass."isArchived"='f'
AND p."name"=''
GROUP BY p."id"
HAVING COUNT(af."id")=1
);
Basically, in the previous query, I had included the hidden people too, which would have been stupid, since I manually hid those people. This query keeps all my work and resets the rest (I guess the Count=1 is not really necessary, but I wanted to have a real look at count>1).
Anyways, this made me go from 4887
persons to 1599
, with approximately 1350
named people.
I ran all the jobs in "Missing" mode (and btw, there may be the same indexing problem for transcoding video as the one you fixed for faces, since the queue is always about the same size as the amount of videos)
And at the end of the jobs, I reached 4239
People. So this manoeuvre removed 600 faces from my workload, in theory, if the clustering was good. After all the discussion, I believe that the explanation is the one you provided already:
The only gain I suppose would be that the index should be cleaner now than before, so perhaps some of those faces could match existing people.
Thanks for sharing your results! 600 of those people being recognized now is pretty interesting. I made a PR that fixes this indexing issue so it doesn't get any duplicate embeddings. I knew that had an effect on recall, but I never realized it was this dramatic.
Wait for it, I just realized that there was a hidden cluster that was getting all the "attention": it has about 500 faces and they clearly are from plenty of different persons. Since I excluded hidden clusters and clusters with more than 1 face, this was present already before the "little" experiment of tonight. I'm going to unhide the top 3 hidden clusters and remove "HAVING COUNT" from my query. Let's see...
So, when looking at the top hidden clusters: only one of them was really a mess, with plenty of different people. The other clusters were real clusters of people I just do not care about.
Before DELETING: people count = 4245
Result of the query: DELETE 2795
Post DELETING: people count = 1450
Post "Recognition : Missing": people count = 2278
Looking at the top unnamed clusters (hidden or not) : the previous cluster of 500 no longer exists, but a new one with 105 photos seems to be its successor, as it is made of plenty of different persons.
So this time, 2000 faces have been either placed in a cluster (or unassigned?). Looking at labelled cluster, there has been some changes, but not to that extent. I doubt that the explanation is as easy as : theses faces have been scattered in very small clusters (2-3 people)...
to be continued...
I think everytime I perform a query that writes to the DB, I should do a REINDEXING afterward. Because this thing with the drop from 4245 to 2278 was clearly an artifact. Many people had been Unassigned. After reindexing + recog:Missing, we're back to 4377.
Interesting! TBH, dropping the index would arguably make more sense in your case, at least compared to constant reindexing.
Good day to everyone. I came across your curious topic, and an interesting question arose, is it possible to download a list of photos from the database that have an unassigned face?
Yes, querying this will give you a list of asset ids and their paths (internal to the container): SELECT DISTINCT ON (a.id) a.id, a."originalPath" FROM assets a INNER JOIN asset_faces af ON a.id = af."assetId" WHERE af."personId" IS NULL;
.
Thanks! I forgot that I need to log in to the database first) Maybe someone else will need it, so the list will be uploaded to a file, which can then be processed manually, unless of course the number of face is small
\COPY (SELECT DISTINCT ON (a.id) a.id, a."originalPath" FROM assets a INNER JOIN asset_faces af ON a.id = af."assetId" WHERE af."personId" IS NULL) TO '/tmp/list.csv' WITH CSV HEADER;
The bug
First, I'm sorry because I don't really understand what "Unassigned faces" is all about. I couldn't find anything in the documentation, but I just realized it is related to the following problem:
I feel like this is a bug: I can never predict the amount of work it's going to take me to tag everyone, since it keeps adding up. Why is a face unassigned in the first place? Why are some faces detected afterwards that were not detected the first time? Are there some threshold that I meet only after I reduced the number of unknown people?
I played a little bit with the settings to understand things better, but even using min-cluster-size of 1 and asking immich to display all available people in the people tab (by editing
repositories/person.repository.js
), I still have some new people showing up after I have spent time tagging people. I also tried with a smaller library (18GB instead of 500GB), but the behavior is the same, to a smaller extent, or course, but still.Thanks for the help as I am very confused.
The OS that Immich Server is running on
Debian 12
Version of Immich Server
v1.106.3
Version of Immich Mobile App
v1.106.3
Platform with the issue
Your docker-compose.yml content
Your .env content
Reproduction steps
Relevant log output
No response
Additional information
No response