Open ldfandian opened 1 year ago
Also, the perf looks extrememly slow (considerably slower than local disk)... Is this expected?
This is not expected. What are the characteristics of your storage in term of latency and bandwidth? Can you test fsspec works well with it ?
On Sun, Jul 2, 2023, 08:18 Dian FAN @.***> wrote:
Also, the perf looks extrememly slow (considerably slower than local disk)... Is this expected?
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/327#issuecomment-1616396672, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437W67Z5KM67IEJORX5TXOEHBTANCNFSM6AAAAAAZ3HETQ4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This is not expected. What are the characteristics of your storage in term of latency and bandwidth? Can you test fsspec works well with it ? … On Sun, Jul 2, 2023, 08:18 Dian FAN @.> wrote: Also, the perf looks extrememly slow (considerably slower than local disk)... Is this expected? — Reply to this email directly, view it on GitHub <#327 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437W67Z5KM67IEJORX5TXOEHBTANCNFSM6AAAAAAZ3HETQ4 . You are receiving this because you are subscribed to this thread.Message ID: @.>
Thanks for the quick response. I guess it is my bad... I used a "4 vcpu + 8G mem" small ec2 box, and I guess the network bandwidth or CPU was exhausted for the requests.
Neverthemind, what's the recommend machine configuration for downloading large dataset like laion2b-int or wukong dataset ? what's the best practice of CPU/memory/network-bandwidth configuration related to the parameter value of process_count*thread_count ?
what could be the cause here, and how to deal with it?
(BTW, it's downloading 15m+ images)