tysontxli / QIANGGUO

QG-Database
125 stars 38 forks source link

About video spider #24

Closed mokizzz closed 5 years ago

mokizzz commented 5 years ago

Hey elder brother,

I have seen your video spider code, you choose to spider the videos on the main page (called video database) of xuexi.cn central website.

However, the videos on the central website update not frequently, so it couldn't satisfy the video amount (8 scores of video watching every day). There should be more video source.

I have wroten a python spider using "selenium" (using headless chrome) to spide local videos such as "shanxi" or "zhejiang" everyday. It runs on my server everyday. Do you need my contribute my spider code, or public my api (returns with a json) to you to get the video data (title, url, insertTime) every day? Wish to help you.

mokizzz commented 5 years ago

It seems that I consider more. I saw the local videos on your log.

tysontxli commented 5 years ago

thanks 18,000 articles and 14,000 videos have been crawled until yesterday. it said that the Propaganda Department of the Central Committee of the CPC cooperate with the Ministry of Public Security to crack down plug-in.i have been suspend the scrapy deployed in alicloud which doesn't forge User-agent.i think we are supposed to suspend all service for period of time.

tysontxli commented 5 years ago

i think using selenium to imitate the action of reading the page of article or video is secret enough.All service are not supposed to deployed in alicloud.

mokizzz commented 5 years ago

I won't give up resistance. You're right, alicloud is not safe enough.

elleys commented 5 years ago

thanks for your hard working