Closed advance512 closed 7 years ago
For 4000 images, I would not use the database part of the package. Just use the generate_signature
method from the ImageSignature
class in image_match/goldberg.py
on your images, and then use the normalized_distance
over all pairs of signatures to generate your distance matrix.
Roughly speaking, decreasing k
and increasing N
should give you better results at the expense of lookup speed. Similarly, increasing n_grid
should give you more discerning signatures (i.e. longer). I haven't tested anything but the defaults with any rigor though.
You are correct in that the labels shouldn't make much of a difference. If you have a couple examples of images you expect to cluster, could you post them here so I could advise further?
Closing the issue, feel free to reopen.
Hi there,
I have a set of 4000 images which I want to create into a cluster. My images are a large set of images taken from various fixed cameras (might move a small, small bit due to wind), some at day some at night, and they might have people, dogs, cats, etc. I am trying to create clusters based on the camera (i.e. clusters of images all taken by the same camera).
I'm planning on using HDBSCAN for this: http://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html
I've got image-match running and have done the following modifications to the library to attempt and get a complete distance matrix:
I have tried settings
distance_cutoff
of SignatureDatabaseBase() to 1.0, andsize
of SignatureES() to 4000, but I seem to be getting a sparse 4000x4000 matrix.Is there any easy way to get the full distance matrix?
Also, any hints on when increasing k, N and n_grid is correct for more precise results?
I also noticed some images contain specific textual labels embedded in the image in the same places (like date/time and camera name). Since these labels aren't big, I'm pretty sure they're mostly ignored here - am I right?