Open yuelei0428 opened 10 months ago
download laion-art parquet, use img2dataset to transfer it into a webdataset like this,u will get some tar that can be used for trainning
from img2dataset import download import shutil import os
output_dir = os.path.abspath("bench")
if os.path.exists(output_dir): shutil.rmtree(output_dir)
download( processes_count=16, thread_count=32, url_list="../tests/test_files/test_10000.parquet", image_size=256, output_folder=output_dir, output_format="webdataset", input_format="parquet", url_col="URL", caption_col="TEXT", enable_wandb=True, number_sample_per_shard=1000, distributor="multiprocessing", )
Recently, it is so sad that all LAION datasets are unavailable. Can anyone provide a small subset as an example?
Can anyone provide a toy dataset that I can do a toy training?
The dataset part is driving me crazy.....