The possibility of same method on visual modality

mct10 / RepCodec

Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization

Other

162 stars 11 forks source link

The possibility of same method on visual modality #8

Closed youngsheen closed 3 weeks ago

youngsheen commented 1 month ago

HI, excellent work!

Glad to see the VQ + SSL method surpass K-Means + SSL method. Recently, we also find that discrete units method can surprisingly works well on images (red DiGIT at https://github.com/DAMO-NLP-SG/DiGIT). Maybe you can apply the repcodec on images as well.

mct10 commented 1 month ago

Hi, thanks for sharing your thoughts! That sounds interesting, we may take a look when we have time.