Closed chinoll closed 2 years ago
I can't speak to any specific value of loss that would "satisfy the requirement", but I can say that I've seen values around ~0.15 after many hours of training with L2 loss
https://wandb.ai/laion/diffusion-prior/runs/1blxu24j?workspace=user-rom1504 this is a run with the latest version of the code Trained for 500M samples Validation loss reaching about 0.3 Train loss 0.17
However you should probably check the other metrics rather than the loss (like cos similarity between text and predicted image, reaching 0.26 here)
We don't know yet what's the best metric to evaluate on. Best ideas are retrieval metric and clip guided generation for now (check #29 to know more)
@nousr the current diffusion prior training runs, are they with text encoding + mask or without?
@nousr the current diffusion prior training runs, are they with text encoding + mask or without?
@lucidrains I took the weekend off to hangout with family 😄 just getting back into the swing of things today
@rom1504 @krish240574 can you confirm are we still doing embedding only? I see condition_on_text_encodings=false
in the wandb linked above.
Yeah indeed still embedding only.
We could try text + embedding but I need to do a bit of work in embedding reader to support that well (idea it to read the npy where the embedding are and the parquet where the text are at the same time, which is the npy_parquet format of embedding reader, but it needs a bit of work to be performant)
https://github.com/rom1504/embedding-reader/pull/24/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R55 interface looks like that if you want to try it sooner rather than later
The performance improvement won't change the API
@nousr the current diffusion prior training runs, are they with text encoding + mask or without?
@lucidrains I took the weekend off to hangout with family smile just getting back into the swing of things today
@rom1504 @krish240574 can you confirm are we still doing embedding only? I see
condition_on_text_encodings=false
in the wandb linked above.
yeah same! (plus sending out some resumes to companies around the area :laughing:)
ah ok, it looks like text encodings aren't present yet, but we could always just train it slowly with CLIP passed into the DiffusionPrior
instance
from what i gathered in the paper, the text encodings helped for the diffusion prior, but not for the decoder (but it wouldn't hurt to have them present for both)
Yeah indeed still embedding only.
We could try text + embedding but I need to do a bit of work in embedding reader to support that well (idea it to read the npy where the embedding are and the parquet where the text are at the same time, which is the npy_parquet format of embedding reader, but it needs a bit of work to be performant)
yea, its tricky because the text encodings also need to have an associated boolean mask (variable lengthed encodings) :cry:
@rom1504 i can always start working on some memmapped solution on my end
Well my current implementation in the reader is slow but I mean slow as in 100k sample/s whereas it should be 10M sample/s
You could adapt the prior training script to add the option now
@rom1504 ohh got it! yup, plan on adding to the current prior training script for sure
https://github.com/rom1504/embedding-reader/pull/24/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R55 yeah so just got to use that in the training script (that PR is not needed to be merged to use it, that's just doc)
However I'd recommend to keep the option to not use the text, as I figure training with the text will be much slower.
@rom1504 yea, we should definitely keep the option, but probably should strive for text encodings to be included diffusion prior training. it seems necessary from the paper (and plus Katherine has it)
@chinoll anyways, to answer your question, join the Laion discord!
When I train DiffusionPrior, the loss decreases to 0.37 and then stops decreasing. In which range will the loss satisfy the requirement?