jy0205 / LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Other
506 stars 26 forks source link

Question about the high-resolution pixel decoder #6

Closed SihengLi99 closed 10 months ago

SihengLi99 commented 10 months ago

Hi,

Very insightful work! A question is about the details of the new high-resolution pixel decoder, which supports to generate high resolution, muliple aspect ratios, and high aesthetics images. Could you please release some details of the training process? Thanks a lot!

Best regards

jy0205 commented 10 months ago

The high-resolution pixel decoder is trained with the same strategy as the original one. Given an input image, it takes the discrete the visual token ID tokenized by our visual tokenizer as condition, and aims to recover the original input.

SihengLi99 commented 10 months ago

OK, thanks for your reply!

jy0205 @.***> 于2023年11月19日周日 20:30写道:

The high-resolution pixel decoder is trained with the same strategy as the original one. Given an input image, it takes the discrete the visual token ID tokenized by our visual tokenizer as condition, and aims to recover the original input.

— Reply to this email directly, view it on GitHub https://github.com/jy0205/LaVIT/issues/6#issuecomment-1817840627, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR4KH4MRGFNNHKEIEWP4U7DYFH3U3AVCNFSM6AAAAAA7Q75XDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXHA2DANRSG4 . You are receiving this because you authored the thread.Message ID: @.***>