ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.3k stars 898 forks source link

FLUX: Optimize dataset loading logic #1038

Closed madroidmaq closed 1 month ago

madroidmaq commented 1 month ago
  1. Use the train.jsonl file as the dataset, keeping consistent with the conventions in the mlx_lm library;
  2. Move the logic for loading the dataset to a separate file for easier maintenance later on;
  3. Hugging Face dataset (mlx-community/dreambooth-dog6) with preprocessing support

fine-tuning whit mlx-community/dreambooth-dog6

python dreambooth.py \
    --progress-prompt 'A photo of an sks dog lying on the sand at a beach in Greece' \
    --progress-every 600 --iterations 1200 --learning-rate 0.0001 \
    --lora-rank 4 --grad-accumulate 8 \
    mlx-community/dreambooth-dog6
madroidmaq commented 1 month ago

@awni @angeloskath

Based on some experience and implementation with the mlx-lm library, I've made some adjustments to the current parts related to the flux dataset. I'd like to hear your thoughts.

PS: A lot of changes are merely adjustments to the code's location (such as moving to a new file, etc.), without altering the specific implementation details.

angeloskath commented 1 month ago

Thanks a lot for the improvements! I like most of them from a quick look on my phone.

We may need to think about how to add prior preservation afterwards cause I was thinking of adding it to the dataset but possibly it was a bad idea.

I will look more closely when I am back on a computer. 🙏

angeloskath commented 1 month ago

@madroidmaq I made a few changes and added back support for the index.json approach with a warning to let people know that they should switch to jsonl. Let me know what you think.

I am still torn about jsonl but I will probably just merge this as is and add another dataset for prior preservation. Something like

# Also a dataset but now we are looking at prior.jsonl instead of train.jsonl
python dreambooth.py .... \
    --prior-preservation mlx-community/dog6 --prior-weight 0.5
madroidmaq commented 1 month ago

@angeloskath It is reasonable to do appropriate dataset migration reminders, a good change, thank you