open-mmlab / Multimodal-GPT

Multimodal-GPT
Apache License 2.0
1.48k stars 126 forks source link

Fix Bug in Dataset Building Process #18

Closed yhna940 closed 1 year ago

yhna940 commented 1 year ago

This commit addresses a bug in the dataset building process in our multimodal GPT (mmGPT) framework. Specifically, the bug was related to the way we were handling dataset configuration in both the builder.py and instruction_finetune.py scripts.

The changes include:

In builder.py, we changed the way the type of dataset is identified in the build_dataset function. Previously, the function checked for the type of dataset using dataset_config.type. However, dataset_config.type was already popped from the dictionary and therefore did not exist. The bug fix changes this to dataset_type, which correctly refers to the popped value. In instruction_finetune.py, the build_dataset function was being called with incorrect keyword argument config. This was updated to the correct keyword argument dataset_config, ensuring the function receives the dataset configuration as intended. These changes are expected to fix the dataset building bug and allow for successful dataset building and training in the mmGPT framework.