How can i store dataset while im using train.py

Caixy1113 commented 1 year ago

Dear timojl,

I have a question regarding the use of the training.py script. Specifically, I am wondering how to store my training dataset. I am encountering an error that says there is no "dataset_repository" folder. Could you please provide guidance on how to properly store my training dataset including PhraseCut,Coco and so on? Thank you for your time and assistance.

Best regards, Cai

erkoiv commented 1 year ago

Disclaimer - Not a CLIPSeg author, just a user.

The unzipped and structured datasets should be located in ~/datasets/dataset_name/... This will circumvent the functionality to un-tar a complete and already structured dataset from ~/dataset_repository/, although you could use this if you have one.
I reverse engineered the dataset setup for COCO. Setting up this dataset for training required following these instructions from the hsnet repository:

COCO-20i

Download COCO2014 train/val images and annotations:

wget http://images.cocodataset.org/zips/train2014.zip wget http://images.cocodataset.org/zips/val2014.zip wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip

Download COCO2014 train/val annotations from our Google Drive: [train2014.zip], [val2014.zip]. (and locate both train2014/ and val2014/ under annotations/ directory).

Resulting in folder structure:

~/datasets/ ├── COCO-20i/
│ ├── annotations/ │ │ ├── train2014/ # (dir.) training masks (from Google Drive) │ │ ├── val2014/ # (dir.) validation masks (from Google Drive) │ │ └── ..some json files.. │ ├── train2014/ │ └── val2014/

If you have a dataset that is set up like the COCO one seen above, then you can change the dataset folder name in the 'wrappers' folder in the 'coco_wrapper.py' file to have the code use your custom dataset instead, although this will also require some changes in the way CLIPSeg uses hsnet to index the dataset.

timojl commented 1 year ago

I think @erkoiv already provided a great answer. The get_from_repository function is primarily used as an internal tool. In this repository it is sufficient to put the data into ~/datasets/<dataset>/.

timojl / clipseg

How can i store dataset while im using train.py #28