Code for the ACL 2023 paper Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
conda env create -f environments/full.yml
conda activate UMMT-VSH
pip install -e fairseq/
pip install -e taming-transformers/
MMT data
NMT data with image source
Binarize translation data for fairseq
bash scripts/multi30k/preproc.sh
Download Flickr30K Flickr30K and MS-COCO image, then create symbolic link
ln -s /xxx/flickr30k
ln -s /xxx/mscoco
Download WIT translation data from with parallel corpora organized for machine translation. The archive also includes tokenized and BPE encoded sentences.
For each translation task, download images in [train|valid|test]_url.txt
to corresponding paths provided in [train|valid|test]_img.txt
. Image filenames are the MD5 hashes of their URLs.
Binarize translation data for fairseq
bash scripts/wit/preproc.sh
parse the SG structures for all images and texts by the tools in SG-parsing/VSG
and SG-parsing/LSG
.
scripts/multi30k-train.sh
script for multi30kscripts/wmt-train.sh
script for wmtscripts/test.sh
script