Closed treby closed 6 years ago
GitHubのIssues一覧取れるようにしてみる。 まずは環境構築のためにvirtualenv使う http://o-tomox.hatenablog.com/entry/2013/07/18/231204
すでに手元の環境にはvirtualenv入っているようだ
mkvirtualenv --python=/usr/local/bin/python3 workon-scrapy
としたい
❯ which python3
python3 not found
~/myprj/scrapy
❯ brew install python3
Updating Homebrew...
brewでこけた……
==> Auto-updated Homebrew!
Updated 2 taps (codekitchen/dinghy, homebrew/core).
==> New Formulae
cointop mysql-client
==> Updated Formulae
harfbuzz ✔ conan glibmm libspectrum percona-xtrabackup snakemake
imagemagick ✔ conjure-up gnupg libvirt pgroonga sqldiff
node ✔ dbhash gnuplot libxmlsec1 php-code-sniffer sqlite-analyzer
numpy ✔ django-completion go libzdb php-cs-fixer ssh-vault
sqlite ✔ dmd go@1.9 llnode phpunit storm
agda dnscrypt-proxy godep macvim picard-tools streamlink
amazon-ecs-cli docker-swarm goenv maxwell pkcs11-helper syncthing
angular-cli draco gopass mercurial pngquant sysbench
annie druid goreleaser midnight-commander pony-stable tarsnap-gui
ansible dscanner gradle mill ponyc telegraf
arangodb dynare hadolint minetest ppsspp teleport
arpack emscripten homeshick mint proj tkdiff
artifactory erlang hub monero prometheus tokei
aws-sdk-cpp etcd hydra mongo-c-driver pstoedit traefik
azure-cli exiftool imagemagick@6 mongo-cxx-driver puzzles typescript
basex faas-cli immortal mydumper qcachegrind uhd
bash fio innotop mysql++ qrencode vault
bazel firebase-cli inspectrum mysql-connector-c++ quicktype vert.x
bento4 flow iozone mytop rocksdb webpack
bibutils fn jenkins nano rubberband winetricks
bro folly jenkins-job-builder nanomsg rust wolfssl
bzt fribidi jenkins-lts nginx sbt@0.13 wslay
cayley frugal jfrog-cli-go node-build securefs xonsh
certbot fuse-emulator kitchen-sync ntl shellharden xtensor
ceylon futhark kompose ntopng singular youtube-dl
chronograf gist kubernetes-cli odpi siril zabbix
clblast git-town libplctag ohcount sjk
codekitchen/dinghy/dinghy gitlab-gem librealsense percona-toolkit skaffold
==> Deleted Formulae
luciddb
Error: python 2.7.13 is already installed
To upgrade to 3.6.5, run `brew upgrade python`
~/myprj/scrapy
❯ brew upgrade python
==> Upgrading 1 outdated package, with result:
python 2.7.13 -> 3.6.5
==> Upgrading python
==> Installing dependencies for python: sqlite, xz
==> Installing python dependency: sqlite
==> Downloading https://homebrew.bintray.com/bottles/sqlite-3.24.0.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring sqlite-3.24.0.high_sierra.bottle.tar.gz
==> Caveats
Homebrew has detected an existing SQLite history file that was created
with the editline library. The current version of this formula is
built with Readline. To back up and convert your history file so that
it can be used with Readline, run:
sed -i~ 's/\\040/ /g' ~/.sqlite_history
before using the `sqlite` command-line tool again. Otherwise, your
history will be lost.
This formula is keg-only, which means it was not symlinked into /usr/local,
because macOS provides an older sqlite3.
If you need to have this software first in your PATH run:
echo 'export PATH="/usr/local/opt/sqlite/bin:$PATH"' >> ~/.zshrc
For compilers to find this software you may need to set:
LDFLAGS: -L/usr/local/opt/sqlite/lib
CPPFLAGS: -I/usr/local/opt/sqlite/include
For pkg-config to find this software you may need to set:
PKG_CONFIG_PATH: /usr/local/opt/sqlite/lib/pkgconfig
==> Summary
🍺 /usr/local/Cellar/sqlite/3.24.0: 11 files, 3.5MB
==> Installing python dependency: xz
==> Downloading https://homebrew.bintray.com/bottles/xz-5.2.4.high_sierra.bottle.tar.gz
######################################################################## 100.0%
==> Pouring xz-5.2.4.high_sierra.bottle.tar.gz
🍺 /usr/local/Cellar/xz/5.2.4: 92 files, 1MB
入ったかな
❯ cat helloworld.py
import scrapy
class BlogSpider(scrapy.Spider):
name = 'blogspider'
start_urls = ['https://blog.scrapinghub.com']
def parse(self, response):
for title in response.css('h2.entry-title'):
yield {'title': title.css('a ::text').extract_first()}
for next_page in response.css('div.prev-post > a'):
yield response.follow(next_page, self.parse)
良い感じ。 Scrapyはフレームワークなのか
Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
For more information including a list of features check the Scrapy homepage at: https://scrapy.org
❯ python helloworld.py
そのまま投げてもだめなのかー まあ、クラスだけだし当然か
This is a work log of a "OSS Gate workshop". "OSS Gate workshop" is an activity to increase OSS developers. Here's been discussed in Japanese. Thanks.
作業ログ作成時の説明
以下のテンプレートを埋めてタイトルに設定します。埋め方例はスクロールすると見えてきます。
タイトル例↓:
OSS Gateワークショップ関連情報