Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
Hi thanks! I don't think I still have it, but it wasn't really long to train and I ran it on my personal computer for 1 day, so it should be reproducible
Hey, thanks for the great work -- do you plan to release your trained LDA model for the analysis in sec 4.2? Thanks!