Closed HYOJINPARK closed 4 months ago
Hello, thank you for your attention.
For the first question, finetune.sh is applicable to the last two stages. You just need to modify the paths for input/output models and the dataset you want to use.
For the second question, your format is correct. You just need to replace "image_folder" in the shell script with the path to your image folder. You can refer to the code provided here: https://github.com/lzw-lzw/GroundingGPT/blob/1f3f53e5e899b7ae24fe3c8b8bdc803ba66a3f0a/lego/train/train.py#L649-L658 I will update the details for dataset folder as soon as possible.
Yes, you can choose any modality of data you want for training.
Hi @lzw-lzw Thanks for your prompt reply
One of my confusion is that when I open here: https://huggingface.co/datasets/zwli/GroundingGPT/tree/main/Stage1 Only Wavecaps.json in here Likewise Stage2 has only Didemo, refcoco,vggs and visual_genome. Stage3 has clotho, activitycpation and flckr30k
Did you use only Wavecpas for stage1?
What data should I use at each stage?
Hi, the data used for the last two stages is just the files within the corresponding directory of the HuggingFace dataset. In the first stage, in addition to Wavecaps being used as audio modality data, it also includes pretraining data from LLaVA and VALLEY.
Hi @lzw-lzw Thanks again your kind reply Actually did you downloaded Valley dataset using this link? https://huggingface.co/datasets/luoruipu1/Valley-Instruct-65k/blob/main/get_jukinmedia_videourl.py Like below code?
response = requests.post('https://www.jukinmedia.com/api/public/video/downloadVideo/'+jmId,headers=headers)
Actually...it looks the link is not working...
Actually, I used the valley data that has been downloaded and stored on an internal NAS by someone else, so I am not familiar with the details of the download process. Perhaps referring to the Valley repository can provide a solution or guidance for this.
Got it Thanks for all your help!!!
Hi, Thanks for your great work.
I try to reproduce your code and thus I would like to ask more details. In your dataset, there are 3 stage in released dataset and paper description in 3.2. However, the script has only pretrain.sh and finetune.sh
Also could you describe more about dataset folder location layout? Is it ok like below?
image_folder --- COCO --- OCR-VQA --- .....etc
Also, Could I train the model without sound dataset?