Closed mokizzz closed 5 years ago
It seems that I consider more. I saw the local videos on your log.
thanks 18,000 articles and 14,000 videos have been crawled until yesterday. it said that the Propaganda Department of the Central Committee of the CPC cooperate with the Ministry of Public Security to crack down plug-in.i have been suspend the scrapy deployed in alicloud which doesn't forge User-agent.i think we are supposed to suspend all service for period of time.
i think using selenium to imitate the action of reading the page of article or video is secret enough.All service are not supposed to deployed in alicloud.
I won't give up resistance. You're right, alicloud is not safe enough.
thanks for your hard working
Hey elder brother,
I have seen your video spider code, you choose to spider the videos on the main page (called video database) of xuexi.cn central website.
However, the videos on the central website update not frequently, so it couldn't satisfy the video amount (8 scores of video watching every day). There should be more video source.
I have wroten a python spider using "selenium" (using headless chrome) to spide local videos such as "shanxi" or "zhejiang" everyday. It runs on my server everyday. Do you need my contribute my spider code, or public my api (returns with a json) to you to get the video data (title, url, insertTime) every day? Wish to help you.