lucidrains / x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers
MIT License
666 stars 47 forks source link

Add YttmTokenizer, ImageTextDataset from @rom1504, Single-GPU trainin… #3

Closed afiaka87 closed 2 years ago

afiaka87 commented 2 years ago

…g script

afiaka87 commented 2 years ago

@lucidrains Let me know if there's any glaring mistakes but this should provide similar functionality to what we had in dalle-pytorch. Main thing missing is webdataset support and multi-GPU, but I figured folks may want to start using this and I don't know how long it will take me to implement that.

Romain made a decent point about how everyone seems to just rewrite/copy-paste the text-image dataloader but unfortunately I can't commit to maintaining a pip package for that either.

afiaka87 commented 2 years ago

@MicPie Thanks for the DDP code. I've rebased your branch onto this one so we can hopefully get that upstream.

afiaka87 commented 2 years ago

Apologies, have not had the time to get this branch working. Closing for now.