Closed afiaka87 closed 3 years ago
I went back to my original parsing script and re-created the webdataset for these to include the url metadata (with img2dataset - also works great). Although I don't get hosting for the indices free; OpenAI's CDN is handling the hosting for the images.
cool stuff! Probably adding a notebook with a bigger index in the repo could be an interesting example too, I'll think about it. I guess I can also link directly to your gist as an example, I'll add an use case section in the readme.
I think you might be interested by clip-front that I just finished and put at https://rom1504.github.io/clip-retrieval/ (currently pointing to an instance of clip-back with a 8M index from cah)
moving this to a discussion
This project works very well! I realize this example is perhaps a bit contrived but it's actually quite useful to be able to do a fuzzy search on these and you can even find generations that just look very similar to the caption you enter and get working samples sometimes. Cool stuff.
https://gist.github.com/afiaka87/f662486fc45199fa4394f3456c8246d7#file-dalle_blog_semantic_search-ipynb