This repository is built upon BEiT and MAE, thanks very much!
We would gradually upload the full-version of the implementation.
@ARTICLE{10431795,
author={Zhang, Guangyi and Hu, Qiyu and Qin, Zhijin and Cai, Yunlong and Yu, Guanding and Tao, Xiaoming},
journal={IEEE Transactions on Communications},
title={A Unified Multi-Task Semantic Communication System for Multimodal Data},
year={2024},
volume={},
number={},
pages={1-1},
keywords={Task analysis;Semantics;Transmitters;Multitasking;Communication systems;Feature extraction;Decoding;Deep learning;dynamic overhead;multimodal data;multi-task semantic communication},
doi={10.1109/TCOMM.2024.3364990}}
Clone this repository and enter the directory using the commands below:
git clone https://github.com/zhang-guangyi/t-udeepsc.git
cd t-udeepsc/
Python 3.8.5
is recommended.
Install the required packages with:
pip install -r requirements.txt (Not provided yet)
If you're having issues with installing PyTorch compatible with your CUDA version, we strongly recommend related documentation page](https://pytorch.org/get-started/previous-versions/).
In our work, we use the bert model to initialize the text encoder, the pretrained weights should be placed at ./pretrain_models. The weights can be downloaded in the huggingface websites.
Use the torchvision, the datasets will be dowmloaded automatically. Then, place the dataset in path ./data/cifar
Download the CMU-MOSI and CMU-MOSEI dataset from Google Drive and place the contents inside ./data/msadata
folder. Note that these are (pre-computed splits).
This dataset is used for text sentiment analysis and text reconstruction. As we use pytreebank in our implementation, the SST2 dataset will also be downloaded automatically. The dataset will be placed at the .cache folder, you can also move it to your required place.
We use the image features are extracted using the bottom-up-attention strategy, with each image being represented as an 2048-D features. The features for each image are stored in a .npz
file. You can prepare the visual features by yourself or download the extracted features from OneDrive or BaiduYun. The downloaded files contains three files: train2014.tar.gz, val2014.tar.gz, and test2015.tar.gz, corresponding to the features of the train/val/test images for VQA-v2, respectively. You should place them as follows:
|-- ./data/vqa_datasets
|-- coco_extract
| |-- train2014.tar.gz
| |-- val2014.tar.gz
| |-- test2015.tar.gz
Besides, we use the VQA samples from the visual genome dataset to expand the training samples. The processed vg questions and annotations files can be found in OneDrive or BaiduYun, and place them as follow:
|-- ./data/vqa_datasets
|-- vqa
| |-- VG_questions.json
| |-- VG_annotations.json
Then, you can run the following script to setup all the needed configurations for the experiments.
$ bash vqa_setup.sh
Running the script will:
Finally, the ./data/vqa_datasets
folders will have the following structure:
|-- ./data/vqa_datasets
|-- coco_extract
| |-- train2014
| | |-- COCO_train2014_...jpg.npz
| | |-- ...
| |-- val2014
| | |-- COCO_val2014_...jpg.npz
| | |-- ...
| |-- test2015
| | |-- COCO_test2015_...jpg.npz
| | |-- ...
|-- vqa
| |-- v2_OpenEnded_mscoco_train2014_questions.json
| |-- v2_OpenEnded_mscoco_val2014_questions.json
| |-- v2_OpenEnded_mscoco_test2015_questions.json
| |-- v2_OpenEnded_mscoco_test-dev2015_questions.json
| |-- v2_mscoco_train2014_annotations.json
| |-- v2_mscoco_val2014_annotations.json
| |-- VG_questions.json
| |-- VG_annotations.json
The instructions are given in execute.sh, use the following script to execute the script. The following script will start training with the default hyperparameters. You can find the detailed hyperparameters in "base_args.py".
$ bash execute.sh
All instructions are given in running_command.sh.
All checkpoint files will be saved to the path set by "--output_dir", the manuscript will set a sub-path for each selected eval task.
We recommend to use a smaller learning rate and a larger batchsize, since the BERT-initialized model are prone to be overfitted.