Closed AlaaKhaddaj closed 1 year ago
Hey, did you set up knot resolver for DNS resolving? This is really important to avoid overloading your DNS and hence having a low success rate
On Wed, Dec 14, 2022, 15:34 AlaaKhaddaj @.***> wrote:
I have been trying to download CC12M, using the same instructions you provided, however, the download is not complete.
I am getting the following error for a significant number of iterations:
total - success: 0.792 - failed to download: 0.195 - failed to resize: 0.013 - images per sec: 485 - count: 12423374
After the code is done, I end up with 1243 tar files. How can I solve this to get the full CC12M dataset?
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/242, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437WB34WFXPJEZ2J74PDWNHLIFANCNFSM6AAAAAAS6R3NSI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@rom1504 Hi, is there any way to re-download only the failed data?
+1 is there method?
You can read the output parquet files and select only the samples that are failed status, write that as parquet (you can do that with pandas or spark) Then rerun img2dataset on it
On Fri, May 26, 2023, 03:42 Oliver Wei @.***> wrote:
+1 is there method?
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/issues/242#issuecomment-1563703809, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437TO7VERUZX34437QLTXIADAJANCNFSM6AAAAAAS6R3NSI . You are receiving this because you were mentioned.Message ID: @.***>
I have been trying to download CC12M, using the same instructions you provided, however, the download is not complete.
I am getting the following error for a significant number of iterations:
total - success: 0.792 - failed to download: 0.195 - failed to resize: 0.013 - images per sec: 485 - count: 12423374
After the code is done, I end up with 1243 tar files. How can I solve this to get the full CC12M dataset?