This is a Scrapy project which can be used to crawl TAIFEX website to scrape information about Taiwan futures and options.
Using CMD is prefered.
# e.g. use anaconda to create a Python 3.6 env named python36, and activate it
CMD> conda activate python36
(python36) CMD> pipenv shell
(taifex-scrapy) (python36) CMD> pipenv install
CMD> C:\Users\{YOUR_NAME}\.virtualenvs\taifex-scrapy-lMUq6lYm\Scripts\activate
(taifex-scrapy) CMD> scrapy crawl taifex
Data will be stored in json
file located at data/taifex.json
.
docker run --rm --name aaaa -p 6800:6800 my-scrapyd-a
scrapyd-deploy -l
scrapyd-deploy
scrapyd-client projects
scrapyd-client spiders -p taifex_scraper
scrapyd-client schedule -p taifex_scraper taifex
#scrapyd-client schedule -p taifex_scraper \*
#scrapyd-client schedule -h
# https://scrapyrt.readthedocs.io/en/stable/api.html#scrapyrt-http-api
curl -v "http://localhost:9080/crawl.json?spider_name=taifex&start_requests=true"
pipenv lock --requirements > requirements.txt
# delete the line about zope.interface==5.1.0; ....
The crawling is scheduled to run by the cron job. The crawled data is persisted in InfluxDB, and we can visually view them through Chronograf web of TICK stack.
cd tick-sandbox
sandbox up
cd ..
docker-compose up -d