爬取东方财富网站的公司公告列表及文本内容。
conda install scrapy pandas requests Beautifulsoup4 pymongo
pip install tushare
scrapy crawl notices
from spiderNotices.text_mongo import TextMongo
# 单个获取
result = TextMongo().get_notices_single('000001.SZ', '2010-01-01', '2012-12-31')
result = TextMongo().get_notices_single('000001.SZ')
# 多个获取
result = TextMongo().get_notices(['000001.SZ', '000002.SZ'])
# 遍历存有的股票
result = TextMongo().get_notices_stk()
SeleniumMiddleware
RandomUserAgent
ProxyIpMiddleware
[ ] pdf的文本提取和图片文字识别
[ ] 市场新闻的爬取