iejMac / video2dataset

Easily create large video dataset from video urls
MIT License
533 stars 65 forks source link

how could I re-download those "failed_to_download"? #336

Open linhaojia13 opened 5 months ago

linhaojia13 commented 5 months ago

I run this command to download webvid-2m:

video2dataset --url_list="results_2M_train.csv" \
        --input_format="csv" \
        --output-format="webdataset" \
    --output_folder="results_2M_train" \
        --url_col="contentUrl" \
        --caption_col="name" \
        --save_additional_columns='[videoid,page_idx,page_dir,duration]' \
        --enable_wandb=True \
    --config=default \

However, I find some of these videos are failed to downloaded, as shown in the xxxxx_stats.json:

{
    "count": 1000,
    "successes": 984,
    "failed_to_download": 16,
    "failed_to_subsample": 0,
    "duration": 402.9706723690033,
    "bytes_downloaded": 2114718582,
    "start_time": 1713343040.4027474,
    "end_time": 1713343443.3734198,
    "status_dict": {
        "success": 984,
        "HTTPSConnectionPool(host='ak.picdn.net', port=443): Read timed out.": 16
    }
}

How could I use video2dataset to re-download these part files that contain failed_to_download?