poeli / EpiCoV_downloader

Download all EpiCoV sequcnes from GISAID
GNU General Public License v3.0
29 stars 8 forks source link

CAPTCHA Issue #16

Open cammm988 opened 3 years ago

cammm988 commented 3 years ago

They made CAPTCHA in order to prevent users from Crawling information

FYI

Thanks

poeli commented 3 years ago

Didn't realize it. Thanks for letting me know!

cammm988 commented 3 years ago

For more Info,

I made a gisaid crawler on my own as well but you work was also well built so I made some references to yours. As you know, there is more than 2,000,000 data in the web so I made the script run parallel to go to each different sample at the same time and crawl contents... I think that's why I got captcha during the crawling because of multiple crawling from same ID or IP.

The thing is that I was given an authority to download all fasta, lineage, metadata, clade, etc with a single click. (Refer to attached files) I don't know if I was given that authority because I tried crawling too much and they gave up stopping me from crawling or if this was the case : I sent them a mail about a month ago asking if I can get all metadata (AA substitution part was the one actually I wanted) which was essential for my project in a polite manner and then started crawling because i didn't get any response. I just realized I got those new buttons a day ago because I didn't have any reason to go to "download" section of EpiCov (you know, We crawl data using "Search" of EpiCov) and this was different from my colleagues who had only a few buttons.

I don't know if was due to my consistent crawling or polite mail but the thing is that they have a power to open up section for you to download whole data with a single click

화면 캡처 2021-07-30 165141 화면 캡처 2021-07-30 165134

Thanks