Closed innat closed 1 year ago
Is this issue solved @innat ?
If it is not solved, can you assign me this issue to work upon ?
@NiharJani2002 Only keras team can assign. Please wait to hear back from keras team if it's ok to take it.
ccing. @jbischof @ianstenbit
Is the scope of this issue to add a image_to_text
workflow to StableDiffusion?
If so, that sounds good to me. It's probably best to start with an example notebook, and we can evaluate either including it in the API or publishing the example on KerasIO from there.
@ianstenbit
Is the scope of this issue to add a image_to_text workflow to StableDiffusion?
(afaik), not entirely. But more likely image-captioning. In the first post, there's a model mentioned, BLIP, please take a look at that.
Though image-captioning isn't listed in the current road-map but the relation of this domain with the current hote cake (stable-diffusion) becomes mutually close. A use case for example, I've a image dataset around 10k, and I'm using BLIP 2 to generate prompt (approximate) to make the training pairs.
Hey @innat -- sorry I totally misunderstood. Was looking too fast :-)
This looks super interesting, and if someone in the community is interested in porting it to Keras, we can certainly look into making it a KerasCV offering. At this time, though, we can't add this to our near-term roadmap for the KerasCV team, so it would have to be a community driven effort.
Probably for something of this scale, the right approach would be to create a separate repo with BLIP components that depend on KerasCV where possible, and once it's up and running we can try to integrate it into our API.
@ianstenbit Thanks for the response. You made a valid suggestion. I'm working with the image2prompt at kaggle in my spare time, and I will look forward to translate BLIP in keras.
I've one query (probably old one, so sorry if it's already discussed and decided). What if the model consist of (around) same percentage of cv components and nlp components, where should it live? For example, arch of model BLIP, consist of cv and nlp components; one of their LLMs model is variant of T5, vanila version of t5 is available on keras-nlp.
Uh, it might fall in keras-nlp (as it did)! https://www.tensorflow.org/tutorials/text/image_captioning cc. @mattdangerw
Good question @innat -- our plan for now is to have KerasCV depend on KerasNLP for models which require both CV and NLP components, so if there are relevant NLP components that don't exist in KerasNLP yet, we should strive to include them there and we can depend on them as necessary.
Short Description
Similar to image-captioning / retrieval model. Similar operatiions of
text-to-image
,image-to-image
.Papers
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Existing Implementations
Motivation
Other Information
( If this ticket doesn't fit on issue section, please move it to discussion.)