Open novruzgurbanov opened 1 year ago
There is an other issue around this opened for a year which I can't reproduce
If you can figure out on which environment it happens it would help.
On Fri, Sep 1, 2023, 09:22 Gurbanov Novruz @.***> wrote:
Hi! After downloading the files from laion2b-en with these parameters:
processes_count=32, url_list=parquet_file, resize_mode='no', output_folder=output_dir, output_format='webdataset', # Download files as a files input_format='parquet', url_col="URL", caption_col="TEXT", number_sample_per_shard=50000, distributor='multiprocessing', )
all files will be downloaded (I think), but then the last iteration goes on forever and I have to stop manually. Could you look at this please?
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/343, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437RWB5I7ZUN65S2WFY3XYGELXANCNFSM6AAAAAA4HEUUC4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@rom1504 I am running the download inside the docker container. Month ago, in the same docker container, it worked seamlessly. But now, I don't know why it cannot stop. I am not a pro about docker images, but if it is possible, maybe I can send you the image and you run a container and try to download some files? (img2dataset already installed)
I think it would be useful if you can try and figure out which specific docker config works vs which ones doesn't work
On Fri, Sep 1, 2023, 09:34 Gurbanov Novruz @.***> wrote:
@rom1504 https://github.com/rom1504 I am running the download inside the docker container. Month ago, in the same docker container, it worked seamlessly. But now, I don't know why it cannot stop. I am not a pro about docker images, but if it is possible, maybe I can send you the image and you run a container and try to download some files? (img2dataset already installed)
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/343#issuecomment-1702305048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437TM3EE456RCCB2UD2TXYGFXLANCNFSM6AAAAAA4HEUUC4 . You are receiving this because you were mentioned.Message ID: @.***>
@rom1504 Sorry, I quite didn't get what do you mean. If the container is same, the image is same, what other configs should I check for? If you have suggestion what to check, would appreciate!
You can check any other environment that works and then try to compare.
Maybe you changed the host if not the container?
On Fri, Sep 1, 2023, 09:43 Gurbanov Novruz @.***> wrote:
@rom1504 https://github.com/rom1504 Sorry, I quite didn't get what do you mean. If the container is same, the image is same, what other configs should I check for? If you have suggestion what to check, would appreciate!
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/343#issuecomment-1702316310, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437RFYVNSI7SFCMKDGD3XYGG3JANCNFSM6AAAAAA4HEUUC4 . You are receiving this because you were mentioned.Message ID: @.***>
@rom1504 Interesting.. I downloaded files with the per shard parameter 10K, the download and the process finished on time. I guess, the function or something else cannot handle more shard per sample
Hi! After downloading the files from laion2b-en with these parameters:
all files will be downloaded (I think), but then the last iteration goes on forever and I have to stop manually. Could you look at this please?
P.S. I tried this function a month ago, and it worked seamlessly. But now, no matter what I do, no matter how simple parameters I defined, it stucks.