ydchen0806 / TokenUnify

This repository contains the official implementation of the paper TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction.
54 stars 7 forks source link

Request for Pre-training Code #1

Closed ZiyaoMeng closed 3 months ago

ZiyaoMeng commented 3 months ago

Hi Yinda,

I would like to express my sincere gratitude for your exceptional work on TokenUnify. However, I have encountered an issue regarding the pre-training section of the code. It appears that this part of the code is currently unavailable. Given that pre-training is a core component of TokenUnify, I believe it is essential for a comprehensive understanding and utilization of your work.

Could you kindly provide the missing pre-training code or offer guidance on how to implement it? Thank you once again for your outstanding contributions and support.

ydchen0806 commented 3 months ago

Hi Ziyao,

I apologize for the oversight in not including the pre-training code initially. Due to the extensive content created for this project, some parts were inadvertently missed. The pre-training code has now been updated and can be directly accessed through main_pretrain_autoregress.py, following the methodologies of MAE and BERT.

We aim to upload the pre-trained weights and the fine-tuned segmentation weights by August, enabling everyone to replicate or further explore the application of autoregressive methods in vision tasks. To better accommodate various architectures, including Transformer, Mamba, and CNN structures, we implemented the autoregressive pre-training scheme by modifying the input tensors via the dataloader to predict the next token. This approach is more flexible than using causal attention within the model, avoiding the need for model modifications and additional parameter adjustments. For instance, a 256x256 image is divided into 16x16 patches, and the first k patches are used to predict the (k+1)th patch.

Thank you for your understanding and continued support.

ZiyaoMeng commented 3 months ago

Hi Ziyao,

I apologize for the oversight in not including the pre-training code initially. Due to the extensive content created for this project, some parts were inadvertently missed. The pre-training code has now been updated and can be directly accessed through main_pretrain_autoregress.py, following the methodologies of MAE and BERT.

We aim to upload the pre-trained weights and the fine-tuned segmentation weights by August, enabling everyone to replicate or further explore the application of autoregressive methods in vision tasks. To better accommodate various architectures, including Transformer, Mamba, and CNN structures, we implemented the autoregressive pre-training scheme by modifying the input tensors via the dataloader to predict the next token. This approach is more flexible than using causal attention within the model, avoiding the need for model modifications and additional parameter adjustments. For instance, a 256x256 image is divided into 16x16 patches, and the first k patches are used to predict the (k+1)th patch.

Thank you for your understanding and continued support.

thanks a lot :)