ku21fan / STR-Fewer-Labels

Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)
MIT License
173 stars 27 forks source link

failed to download data_CVPR2021.zip #4

Closed Hlic818 closed 2 years ago

Hlic818 commented 2 years ago

I try to download the data from the link:https://www.dropbox.com/sh/1s6r4slurc5ei2n/AACg6TqoDfGdKe8t40Em1fgxa?dl=0&preview=data_CVPR2021.zip on different computers connecting different networks, but all failed. Is there any problems about whole data? At your convenience, would you please send me the data(excluding synthetic data)via the mail?my email:1050217987@qq.com

ku21fan commented 2 years ago

Hello,

Sorry for the inconvenience of downloading. The file is huge (8.42 GB).. so it is hard to send via email.

Can you try to download it with the following command? (via the original download URL)

wget -O data_CVPR2021.zip https://www.dropbox.com/sh/1s6r4slurc5ei2n/AABJZzmWTCNt6EWVXbQ-QdDUa/data_CVPR2021.zip?dl=0

or this command, (we just reset the download URL of the file.)

wget -O data_CVPR2021.zip https://www.dropbox.com/s/o27gunx16usjhgu/data_CVPR2021.zip?dl=0

In our environment, both commands can still download the file data_CVPR2021.zip. So, we don't know why the problem happens :(

If you still cannot download this file, I am planning to upload it to Baidu.

Hope it helps.

Hlic818 commented 2 years ago

Dear Dr. Baek, I'm sorry for disturbing you again. In our setting, both the following commands wget -O data_CVPR2021.zip https://www.dropbox.com/sh/1s6r4slurc5ei2n/AABJZzmWTCNt6EWVXbQ-QdDUa/data_CVPR2021.zip?dl=0 wget -O data_CVPR2021.zip https://www.dropbox.com/s/o27gunx16usjhgu/data_CVPR2021.zip?dl=0 failed to download the data. At your convenience, would you please upload it to Baidu as soon as possiable? We're desperate for the data. Thank you for your assistance. Sincerely, Xiao Li

At 2021-10-21 18:35:57, "Baek JeongHun" @.***> wrote:

Hello,

Sorry for the inconvenience of downloading. The file is huge (8.42 GB).. so it is hard to send via email.

Can you try to download it with the following command? (via the original download URL)

wget -O data_CVPR2021.zip https://www.dropbox.com/sh/1s6r4slurc5ei2n/AABJZzmWTCNt6EWVXbQ-QdDUa/data_CVPR2021.zip?dl=0

or this command, (we just reset the download URL of the file.)

wget -O data_CVPR2021.zip https://www.dropbox.com/s/o27gunx16usjhgu/data_CVPR2021.zip?dl=0

In our setting, both commands can still download the file data_CVPR2021.zip. So, we don't know why the problem happens :(

If you still cannot download this file, I am planning to upload it to Baidu.

Hope it helps.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

ku21fan commented 2 years ago

OK. I am going to upload it Baidu right now.

ku21fan commented 2 years ago

I uploaded data to baidu (password: datm)

The file data_CVPR2021.zip is split into 3files: data_CVPR2021_split.z01, data_CVPR2021.z02, and data_CVPR2021.zip.

You should download them all and then run the following commands

cat data_CVPR2021_split.z* > tmp.zip
unzip tmp.zip

then you will get data_CVPR2021.zip (about 8.5GB)

Hope it helps :)

yusirhhh commented 2 years ago

When I use the above command unzip tmp.zip, It return a error: End-of-centdir-64 signature not where expected (prepended bytes?)

Do it happen when you unzip this file?

ku21fan commented 2 years ago

@yusirhhh It did not happen to me. umm.. can you try to re-download or check md5sum of downloaded files?

md5sum of each file are as follows.

77c78ac256ffbf3cc5e36c8bd5e00b4d  data_CVPR2021_split.z01
ac4424d99c8c8ccdcb17dd7e2b8b9ae6  data_CVPR2021_split.z02
9c5c71ad13f72f700434bcd438b49d1c  data_CVPR2021_split.zip
yusirhhh commented 2 years ago

Hello, when performing scene text recognition, the input picture is processed into lmdb format. I want to conduct research on handwritten text on this program. I would like to ask if processing as lmdb format has a big impact on the speed of training. I look forward to your reply.

ku21fan commented 2 years ago

@yusirhhh Hello, I have not compared the speed of training carefully, but I believe that using lmdb format is faster than not using lmdb format.

Following the convention that CRNN implementation did, I usually use the lmdb format. And of course, the lmdb format helps to handle many image files as one DB file. Thus, I use the lmdb format because of convention and convenience.

So, in my opinion, if you don't need to follow convention and do not get the improvement of speed from lmdb, you may not need to use the lmdb format.