krennic999 / STAR

STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
https://krennic999.github.io/STAR/
113 stars 1 forks source link

release? #1

Open Njasa2k opened 3 months ago

krennic999 commented 3 months ago

Will, around a month, we need to follow the company's open-source process.

ucasyjz commented 3 months ago

Great, Can't wait to try it out!

daiyixiang666 commented 3 months ago

Will you guys also release the weights? Thanks!

krennic999 commented 3 months ago

Yes, please wait for a while, thanks.

ucasyjz commented 3 months ago

Will the training code be released

daiyixiang666 commented 3 months ago

Do you have some evaluation result about the 256x256 VAR like the FID in animal or other case in the MJD-30k

daiyixiang666 commented 3 months ago

Beside,do you think the result that you generate are darker than usual image ?

krennic999 commented 3 months ago

We evaluated on ImageNet-val under 256 cases, and the recon PSNR is around 22, which seems to be better than VQGAN. The brightness of the generated dataset is caused by the dataset used for training. Fine-tuning on a small subset will alleviate this issue.

daiyixiang666 commented 3 months ago

yes,thanks , but I means the VAR not the VAE result, do you have the FID score of the VAR in Per-category FID on MJHQ-30K? And Really thanks for your reply again!

krennic999 commented 3 months ago

Well, the per-category FID on MJHQ can be found in Fig.2 in our mainpaper, for specific value, refer to table below:

8061720005699_ pic

daiyixiang666 commented 3 months ago

Thanks a lot, I would say that it is really impressive to see that it has such low FID score!!!

daiyixiang666 commented 3 months ago

Do you guys add the qk normalization

krennic999 commented 3 months ago

We do not have qk normalization, but probably will in the next version. The visual auto-regression paradigm proposed by VAR is full of potential, and we are currently working on exploring it for more stable and amazing results.

daiyixiang666 commented 3 months ago

So looking forward to the code and ckpt hhhhh

nunbuzor commented 3 months ago

Hi @krennic999, I was wondering if there were any updates on the release timeline? Was it still scheduled for next week, or have there been any changes or delays? Looking forward to hearing back from you!

krennic999 commented 2 months ago

Hi, we apologize for the inconvenience. After our discussions, we have determined that the current version, due to issues with VQVAE and other factors, is not stable enough for practical applications. We may release a revised version, including modified VQVAE and 1024 generation later this year.

krennic999 commented 2 months ago

However, we will do our best to answer any questions about this project. Thank you for your interest.

daiyixiang666 commented 2 months ago

Is the unstable means although the model can achieve a lower FID score, but the generative image is not as stable as in diffusion model?

daiyixiang666 commented 2 months ago

I also do my own VAR t2i training. I find that it is somehow like 抽奖

Ccioud commented 2 months ago

Hi,@krennic999,can I ask when is the approximate release date?

krennic999 commented 2 months ago

@daiyixiang666 yes, the results are very unstable and there are some issues with generating some details

krennic999 commented 2 months ago

@Ccioud err... currently we are addressing the issue of VAE, I think we can provide a usable solution by CVPR submission deadline.

daiyixiang666 commented 2 months ago

What your opinion about the VAE, I think we can have some in depth discussion. The lower scale of the vae reconstruction is really bad

daiyixiang666 commented 2 months ago

Do you think the share notebook in the VAE is important?

krennic999 commented 2 months ago

@daiyixiang666, hi, you can send an email to xiao_xiao@mail.ustc.edu.cn and we can discuss further

HalvesChen commented 2 months ago

@krennic999 @Ccioud Could you describe instability of var in detail? I've been doing experiments lately and I'm interested in it.