model / embedding information

googleartsculture / art-palette

This repo is related to the Art-Palette experiment from Google Arts & Culture.

Apache License 2.0

185 stars 17 forks source link

model / embedding information #2

Open vade opened 3 years ago

vade commented 3 years ago

Hi!i

Firstly thanks for sharing this work! Im working on a similar problem with a different palette size - and im curious how you created the embedding for n=5 LAB color values to a 15 dimensional space. im assuming that you arent just doing a concatenation of the LAB values into a single 15 dimensional vector? 3 x 5 = 15 after all?

LAB is perceptually uniform and in theory should just work as Euclidean distance, but, trying it myself im not getting the best result and ive been trying other distance metrics like Hausdorff distance.

Looking at the embedding model in Netron it appears theres some biasing /learned weights changing the projection of an palette vector into a more usable perceptual embedding?

Can anyone shed any light as to how this was calculated / trained?

Thank you!

vade commented 3 years ago

Looking at the model protobuff in Netron it seems like theres a few steps:

concat the average color of the 5 LAB triplets to make a 18 length vector
some matrix multiplication to expand to an intermedate space of 256 length
some additional matrix multtiplications to transform that space
finally dimensionality reduction to the target embedding space.

Im curious if there was / is a data set used to determine the weights for the perceptual space and how that may have been put together (and it that is public or could be made so? sorry to be so needy haha)

Or is the graph simply some procedural transforms and TF was used as a lightweight graph system? If so, any details on the logic therein would be helpful.

Thank you again!

EtienneFerrier commented 3 years ago

Hi!

Indeed concatenating the colors does not work because we don't want color order to impact the distance between palettes.

The goal of the model is to embed the palettes into a new space in which they can be compared using the L2 distance, so that we can find nearest neighbors efficiently (with a KD-tree for instance). Ideally, changing the color order of a palette would not change its embedding.

We create the embedding by training a Siamese neural network over randomly generated palettes. We tuned the hyperparameters of the model (e.g. layer size, number of layers, etc.). The model in the repo is a "tower" of the Siamese model (from Lab encoding to FC) after training.

9TsLH4CNLghreRd

To create the training set, we pick colors at random uniformly in RGB space and we compute the "ground truth" distance between palettes A and B as the minimum L2 distance between A and every color permutation of B (there are 120 for 5-color palettes). A few million examples should do.

I hope that it helps!

vade commented 3 years ago

This is hugely helpful. Thank you @EtienneFerrier - sincerely appreciate the response!

tamnguyenvan commented 3 years ago

Hi @EtienneFerrier, Thanks for this awesome project. I have a question about the ground truth distances calculation. You just calculate L2 distance between palette A (shape of [5, 3]) and every permutation of palette B then take the minimum value? This should be done in RGB space? Thanks,

EtienneFerrier commented 2 years ago

Hi, thanks for the kind words. Correct but this should be done in Lab space for best visual results - not RGB space.