prajwalsingh / EEGStyleGAN-ADA

Pytorch code of paper "Learning Robust Deep Visual Representations from EEG Brain Recordings". [WACV 2024]
MIT License
27 stars 4 forks source link

Why not consider using generative models like SD or SDXL? #18

Closed weipipione closed 2 months ago

prajwalsingh commented 2 months ago

Hi @weipipione , compared to SD or SDXL, we are training Generative Network from scratch. We can fine-tune it on the pre-trained model, but that was not the goal of the work. In the case of EEG-Image, the dataset is very small (N-way K-shot), and during the project, we aim to train the network from scratch with stability and high fidelity.

weipipione commented 2 months ago

Hi @weipipione , compared to SD or SDXL, we are training Generative Network from scratch. We can fine-tune it on the pre-trained model, but that was not the goal of the work. In the case of EEG-Image, the dataset is very small (N-way K-shot), and during the project, we aim to train the network from scratch with stability and high fidelity.

@prajwalsingh Thanks for quick reply! But I still have some concerns. The CVPR40 dataset contains only 40 categories of images, with 50 images per category. Training a new StyleGAN on such a dataset may lead to overfitting. If the generative model can only produce these 40 categories of images, the diversity of the images is fixed. Therefore, using metrics that only focus on image quality and diversity, such as IS, to evaluate the results might not be appropriate. Moreover, the training images include many with plain backgrounds, which could cause overfitting and make the generated images very similar to the ground truth (GT) images, thereby narrowing the gap between the generated model's distribution and the real data distribution. This might result in better FID and KID scores. Are there better evaluation metrics that could be used in this context?

prajwalsingh commented 2 months ago

@weipipione, This is a really great question regarding evaluation metrics. Other than that, people nowadays use CLIP-based scores or LPIPS distances. Recently, I came across one work, "Few-shot Image Generation via Cross-domain Correspondence." The authors proposed Intra-cluster LPIPS, which measures the diversity and fidelity of generated images in a few-shot setting. You can check Section 4 of this work for more information, which is the case in EEG2Image.

weipipione commented 2 months ago

@prajwalsingh Thanks!