About the ve_train/dev/test.json data

shizhediao / DaVinci

Source code for the paper "Prefix Language Models are Unified Modal Learners"

BSD 3-Clause "New" or "Revised" License

42 stars 3 forks source link

Hi, You need to adapt the downloaded dataset into our training format. Basically, put the images into a separate folder under images and put the metadata as a json file in data/.

DaVinci/
    data/
        coco_test.json
        coco_train.json
        coco_val.json
        *.json

    images/
        coco/
            train2014/*.jpg
            val2014/*.jpg
            test2015/*.jpg

        visualgenome/
            image/*.jpg

        nlvr2/
            images/
                train/0-99/*.png
            dev/*.png
            test1/*.png

The json file contains a list. Each item in the list is a dictionary with two key-value pairs: {'binary': bs64_encoding_of_the_image, 'caption': text_of_image}.

shizhediao / DaVinci

About the ve_train/dev/test.json data #4