Open vade opened 3 years ago
Looking at the model protobuff in Netron it seems like theres a few steps:
Im curious if there was / is a data set used to determine the weights for the perceptual space and how that may have been put together (and it that is public or could be made so? sorry to be so needy haha)
Or is the graph simply some procedural transforms and TF was used as a lightweight graph system? If so, any details on the logic therein would be helpful.
Thank you again!
Hi!
Indeed concatenating the colors does not work because we don't want color order to impact the distance between palettes.
The goal of the model is to embed the palettes into a new space in which they can be compared using the L2 distance, so that we can find nearest neighbors efficiently (with a KD-tree for instance). Ideally, changing the color order of a palette would not change its embedding.
We create the embedding by training a Siamese neural network over randomly generated palettes. We tuned the hyperparameters of the model (e.g. layer size, number of layers, etc.). The model in the repo is a "tower" of the Siamese model (from Lab encoding to FC) after training.
To create the training set, we pick colors at random uniformly in RGB space and we compute the "ground truth" distance between palettes A and B as the minimum L2 distance between A and every color permutation of B (there are 120 for 5-color palettes). A few million examples should do.
I hope that it helps!
This is hugely helpful. Thank you @EtienneFerrier - sincerely appreciate the response!
Hi @EtienneFerrier, Thanks for this awesome project. I have a question about the ground truth distances calculation. You just calculate L2 distance between palette A (shape of [5, 3]) and every permutation of palette B then take the minimum value? This should be done in RGB space? Thanks,
Hi, thanks for the kind words. Correct but this should be done in Lab space for best visual results - not RGB space.
Hi!i
Firstly thanks for sharing this work! Im working on a similar problem with a different palette size - and im curious how you created the embedding for n=5 LAB color values to a 15 dimensional space. im assuming that you arent just doing a concatenation of the LAB values into a single 15 dimensional vector? 3 x 5 = 15 after all?
LAB is perceptually uniform and in theory should just work as Euclidean distance, but, trying it myself im not getting the best result and ive been trying other distance metrics like Hausdorff distance.
Looking at the embedding model in Netron it appears theres some biasing /learned weights changing the projection of an palette vector into a more usable perceptual embedding?
Can anyone shed any light as to how this was calculated / trained?
Thank you!