I'm training the flux controlnet. The train_dataset.map() takes long time to finish. However, when I killed one training process and want to restart a new training with the same dataset. I can't reuse the mapped result even I defined the cache dir for the dataset.
with accelerator.main_process_first():
from datasets.fingerprint import Hasher
# fingerprint used by the cache for the other processes to load the result
# details: https://github.com/huggingface/diffusers/pull/4038#discussion_r1266078401
new_fingerprint = Hasher.hash(args)
train_dataset = train_dataset.map(
compute_embeddings_fn, batched=True, new_fingerprint=new_fingerprint, batch_size=10,
)
Describe the bug
I'm training the flux controlnet. The train_dataset.map() takes long time to finish. However, when I killed one training process and want to restart a new training with the same dataset. I can't reuse the mapped result even I defined the cache dir for the dataset.
with accelerator.main_process_first(): from datasets.fingerprint import Hasher
Steps to reproduce the bug
train flux controlnet and start again
Expected behavior
will not map again
Environment info
latest diffusers