The latter research papers^1 use the following generative scheme to reconstruct images:
Low-level (perceptual): maps brain signals to the embedding space of Stable Diffusion’s VAE. This embedding is then pushed through the VAE decoder block to produce a "low-level" reconstruction of the image.
High-level (semantic): maps brain signals to the CLIP image space. This embedding is then fed through pretrained img2img model with obtained "low-level" reconstructed image.
In this task, one has to refer to the SD VAE and img2img model in order to inference them in a jupyter notebook notebooks.
[!IMPORTANT]
Please, pay attention to the dimensions of the VAE latents and CLIP embeddings. It is important to know them for a successful method implementation.
The latter research papers^1 use the following generative scheme to reconstruct images:
In this task, one has to refer to the SD VAE and img2img model in order to inference them in a jupyter notebook notebooks.