m-bain / webvid

Large-scale text-video dataset. 10 million captioned short videos.
575 stars 35 forks source link

Is there a download limit imposed by the video source website? #24

Open linhaojia13 opened 5 months ago

linhaojia13 commented 5 months ago

I run this commad:

video2dataset --url_list="results_2M_train.csv" \
        --input_format="csv" \
        --output-format="webdataset" \
    --output_folder="test" \
        --url_col="contentUrl" \
        --caption_col="name" \
        --save_additional_columns='[videoid,page_idx,page_dir,duration]' \
        --enable_wandb=False \
    --config=default \

At first, the download process went smoothly, and I successfully downloaded 96 .tar files, totaling about 200GB. Then, error messages started appearing.

HTTPSConnectionPool(host='ak.picdn.net', port=443): Read timed out.

I switched to a different computer and attempted to download again, but encountered the same errors after downloading around 200GB. Could this be due to a download limit imposed by the video source website? How should I resolve this issue? @m-bain