kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
http://kdexd.xyz/virtex
MIT License
556 stars 61 forks source link

Pre-training on another dataset #30

Closed GeorgeBatch closed 2 years ago

GeorgeBatch commented 2 years ago

Hi,

Thank you for making this code public!

I want to pre-train a captioning model on another dataset (ARCH dataset). I went through your codebase and realized that first I need to create a Dataset class for my dataset similar to your Dataset class in virtex/data/datasets/coco_captions.py. Next, I will need to make a modified version of virtex/data/datasets/captioning.py.

Somehow the files in virtex/data/datasets/ are all ignored by git and I can't make any of them become visible. Can you please help me with it? I would also appreciate any suggestions on how to modify the code at this stage in order to cause the least amount of disruption to the functions and classes which rely on the Dataset classes.

Many thanks, George Batchkala

GeorgeBatch commented 2 years ago

Hello again,

I thought the problem was connected to you packaging the library. It turned out that I have been using the .gitignore file in a wrong way. Here is how I should have excluded newly-created files from .gitignore: https://stackoverflow.com/questions/5533050/gitignore-exclude-folder-but-include-specific-subfolder

If you have any suggestions on how to extend the codebase in the best way, please let me know. If you do not have time for this, feel free to close this issue.

Best wishes, George Batchkala

kdexd commented 2 years ago

Hi @GeorgeBatch , thank you for trying the codebase! This was an artifact of an old path, I fixed this issue in master branch.