Closed SihengLi99 closed 10 months ago
The high-resolution pixel decoder is trained with the same strategy as the original one. Given an input image, it takes the discrete the visual token ID tokenized by our visual tokenizer as condition, and aims to recover the original input.
OK, thanks for your reply!
jy0205 @.***> 于2023年11月19日周日 20:30写道:
The high-resolution pixel decoder is trained with the same strategy as the original one. Given an input image, it takes the discrete the visual token ID tokenized by our visual tokenizer as condition, and aims to recover the original input.
— Reply to this email directly, view it on GitHub https://github.com/jy0205/LaVIT/issues/6#issuecomment-1817840627, or unsubscribe https://github.com/notifications/unsubscribe-auth/AR4KH4MRGFNNHKEIEWP4U7DYFH3U3AVCNFSM6AAAAAA7Q75XDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXHA2DANRSG4 . You are receiving this because you authored the thread.Message ID: @.***>
Hi,
Very insightful work! A question is about the details of the new high-resolution pixel decoder, which supports to generate high resolution, muliple aspect ratios, and high aesthetics images. Could you please release some details of the training process? Thanks a lot!
Best regards