jackieli123723 / jackieli123723.github.io

✅lilidong 个人博客
9 stars 0 forks source link

python易错知识点 #54

Open jackieli123723 opened 6 years ago

jackieli123723 commented 6 years ago

易错知识点

[root@lilidong /home/worker/python/pdf_crawler]# pip install Pillow
bash: pip: command not found

//python pip安装
下载安装

  wget "https://pypi.python.org/packages/source/p/pip/pip-1.5.4.tar.gz#md5=834b2904f92d46aaa333267fb1c922bb" --no-check-certificate
     tar -axf pip-1.5.4.tar.gz 
     cd pip-1.5.4/
     python setup.py install
安装完后,使用 pip -V 报错,如下:

bash: pip: command not found...
这时候怎么办呢??

做一个软连接即可:

首先查下安装路径:

find / -name pip

然做个软连接

ln -sv /usr/local/python/bin/pip /usr/bin/pip

做完就可以用了。

路径不要根据你自己的进行改变。

[root@lilidong /usr/local/bin]# ll
total 35340
lrwxrwxrwx 1 root root       39 Jan 31 07:19 forever -> ../lib/node_modules/forever/bin/forever
lrwxrwxrwx 1 root root       40 Jan 31 02:21 n -> /home/worker/node-v8.0.0-linux-x64/bin/n
-rwxr-xr-x 1 root root 36186806 Jan 31 03:00 node
lrwxrwxrwx 1 root root       38 Jan 31 03:00 npm -> ../lib/node_modules/npm/bin/npm-cli.js
lrwxrwxrwx 1 root root       38 Jan 31 02:35 npx -> ../lib/node_modules/npm/bin/npx-cli.js
lrwxrwxrwx 1 root root       42 Jan 31 02:16 pm2 -> /home/worker/node-v8.0.0-linux-x64/bin/pm2
lrwxrwxrwx 1 root root       45 Jan 31 02:20 rimraf -> /home/worker/node-v8.0.0-linux-x64/bin/rimraf
lrwxrwxrwx 1 root root       42 Jan 31 02:20 ssr -> /home/worker/node-v8.0.0-linux-x64/bin/ssr
lrwxrwxrwx 1 root root       42 Jan 31 02:27 webpack -> ../lib/node_modules/webpack/bin/webpack.js

Processing dependencies for pip==1.5.4
Finished processing dependencies for pip==1.5.4
[root@lilidong /home/worker/python/pip-1.5.4]# find / -name pip
/home/worker/python/pip-1.5.4/pip
/home/worker/python/pip-1.5.4/build/lib/pip
/usr/bin/pip
/usr/lib/python2.7/site-packages/pip-1.5.4-py2.7.egg/pip
/.cache/pip
[root@lilidong /home/worker/python/pip-1.5.4]# ln -sv /usr/local/python/bin/pip /usr/bin/pip
ln: failed to create symbolic link '/usr/bin/pip': File exists
[root@lilidong /home/worker/python/pip-1.5.4]# ln -s /home/worker/python/pip-1.5.4/pip /usr/local/bin/pip
[root@lilidong /home/worker/python/pip-

软连接(环境变量)
 ln -s /home/worker/python/pip-1.5.4/pip /usr/local/bin/pip

[root@lilidong /home/worker/python/pip-1.5.4]# ln -s /home/worker/python/pip-1.5.4/pip /usr/local/bin/pip
[root@lilidong /home/worker/python/pip-1.5.4]# cd /usr/local/bin/
[root@lilidong /usr/local/bin]# ll
total 35340
lrwxrwxrwx 1 root root       39 Jan 31 07:19 forever -> ../lib/node_modules/forever/bin/forever
lrwxrwxrwx 1 root root       40 Jan 31 02:21 n -> /home/worker/node-v8.0.0-linux-x64/bin/n
-rwxr-xr-x 1 root root 36186806 Jan 31 03:00 node
lrwxrwxrwx 1 root root       38 Jan 31 03:00 npm -> ../lib/node_modules/npm/bin/npm-cli.js
lrwxrwxrwx 1 root root       38 Jan 31 02:35 npx -> ../lib/node_modules/npm/bin/npx-cli.js
lrwxrwxrwx 1 root root       33 Feb 10 10:25 pip -> /home/worker/python/pip-1.5.4/pip
lrwxrwxrwx 1 root root       42 Jan 31 02:16 pm2 -> /home/worker/node-v8.0.0-linux-x64/bin/pm2
lrwxrwxrwx 1 root root       45 Jan 31 02:20 rimraf -> /home/worker/node-v8.0.0-linux-x64/bin/rimraf
lrwxrwxrwx 1 root root       42 Jan 31 02:20 ssr -> /home/worker/node-v8.0.0-linux-x64/bin/ssr
lrwxrwxrwx 1 root root       42 Jan 31 02:27 webpack -> ../lib/node_modules/webpack/bin/webpack.js
[root@lilidong /usr/local/bin]#

 [root@lilidong /usr/local/bin]# pip -v

Usage:
  pip <command> [options]

Commands:
  install                     Install packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  zip                         DEPRECATED. Zip individual packages.
  unzip                       DEPRECATED. Unzip individual packages.
  bundle                      DEPRECATED. Create pybundles.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  -v, --verbose               Give more output. Option is additive, and can be used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output.
  --log-file <path>           Path to a verbose non-appending log, that only logs failures. This log is active by default at /.pip/pip.log.
  --log <path>                Path to a verbose appending log. This log is inactive by default.
  --proxy <proxy>             Specify a proxy in the form [user:passwd@]proxy.server:port.
  --timeout <sec>             Set the socket timeout (default 15 seconds).
  --exists-action <action>    Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup.
  --cert <path>               Path to alternate CA bundle.

[root@lilidong /home/worker/python]# cd pdf_crawler/
[root@lilidong /home/worker/python/pdf_crawler]# ll
total 1564
-rw-r--r-- 1 root root    1369 Feb 10 09:27 crawler.py
-rw-r--r-- 1 root root 1595408 Nov  6  2016 get-pip.py
[root@lilidong /home/worker/python/pdf_crawler]# ll
total 1564
-rw-r--r-- 1 root root    1369 Feb 10 09:27 crawler.py
-rw-r--r-- 1 root root 1595408 Nov  6  2016 get-pip.py
[root@lilidong /home/worker/python/pdf_crawler]# python crawler.py
Traceback (most recent call last):
  File "crawler.py", line 8, in <module>
    import requests
ImportError: No module named requests
[root@lilidong /home/worker/python/pdf_crawler]# pip install requests
Downloading/unpacking requests
  Downloading requests-2.18.4-py2.py3-none-any.whl (88kB): 88kB downloaded
Downloading/unpacking certifi>=2017.4.17 (from requests)
  Downloading certifi-2018.1.18-py2.py3-none-any.whl (151kB): 151kB downloaded
Downloading/unpacking idna>=2.5,<2.7 (from requests)
  Downloading idna-2.6-py2.py3-none-any.whl (56kB): 56kB downloaded
Downloading/unpacking chardet>=3.0.2,<3.1.0 (from requests)
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133kB): 133kB downloaded
Downloading/unpacking urllib3>=1.21.1,<1.23 (from requests)
  Downloading urllib3-1.22-py2.py3-none-any.whl (132kB): 132kB downloaded
Installing collected packages: requests, certifi, idna, chardet, urllib3
  Found existing installation: chardet 2.2.1
    Uninstalling chardet:
      Successfully uninstalled chardet
Successfully installed requests certifi idna chardet urllib3
Cleaning up...
[root@lilidong /home/worker/python/pdf_crawler]# ll
total 1564
-rw-r--r-- 1 root root    1369 Feb 10 09:27 crawler.py
-rw-r--r-- 1 root root 1595408 Nov  6  2016 get-pip.py
[root@lilidong /home/worker/python/pdf_crawler]# python crawler.py
Traceback (most recent call last):
  File "crawler.py", line 9, in <module>
    from bs4 import BeautifulSoup
ImportError: No module named bs4
[root@lilidong /home/worker/python/pdf_crawler]# pip install bs4
Downloading/unpacking bs4
  Downloading bs4-0.0.1.tar.gz
  Running setup.py (path:/tmp/pip_build_root/bs4/setup.py) egg_info for package bs4

Downloading/unpacking beautifulsoup4 (from bs4)
  Downloading beautifulsoup4-4.6.0-py2-none-any.whl (86kB): 86kB downloaded
Installing collected packages: bs4, beautifulsoup4
  Running setup.py install for bs4

Successfully installed bs4 beautifulsoup4

--结果
[root@lilidong /home/worker/python/pdf_crawler]# ll
total 24016
-rw-r--r-- 1 root root 2878464 Feb 10 10:33 01StableMatching.pdf
-rw-r--r-- 1 root root 2657280 Feb 10 10:33 02AlgorithmAnalysis.pdf
-rw-r--r-- 1 root root  293888 Feb 10 10:33 03Graphs.pdf
-rw-r--r-- 1 root root 3614720 Feb 10 10:33 04GreedyAlgorithmsI.pdf
-rw-r--r-- 1 root root 2896896 Feb 10 10:33 05DivideAndConquerI.pdf
-rw-r--r-- 1 root root 4186112 Feb 10 10:33 05DivideAndConquerII.pdf
-rw-r--r-- 1 root root  508928 Feb 10 10:33 06DynamicProgrammingI.pdf
-rw-r--r-- 1 root root 1150976 Feb 10 10:33 06DynamicProgrammingII.pdf
-rw-r--r-- 1 root root  296960 Feb 10 10:33 07NetworkFlowI.pdf
-rw-r--r-- 1 root root  348160 Feb 10 10:33 07NetworkFlowII.pdf
-rw-r--r-- 1 root root  362496 Feb 10 10:33 08IntractabilityI.pdf
-rw-r--r-- 1 root root  366592 Feb 10 10:33 08IntractabilityII.pdf
-rw-r--r-- 1 root root  266240 Feb 10 10:33 10ExtendingTractability.pdf
-rw-r--r-- 1 root root  277504 Feb 10 10:33 11ApproximationAlgorithms.pdf
-rw-r--r-- 1 root root  321536 Feb 10 10:33 12LocalSearch.pdf
-rw-r--r-- 1 root root    1369 Feb 10 09:27 crawler.py

---
现在的网络爬虫越来越多,有很多爬虫都是初学者写的,和搜索引擎的爬虫不一样,他们不懂如何控制速度,结果往往大量消耗服务器资源,导致带宽白白浪费了。

其实Nginx可以非常容易地根据User-Agent过滤请求,我们只需要在需要URL入口位置通过一个简单的正则表达式就可以过滤不符合要求的爬虫请求:

    ...
    location / {
        if ($http_user_agent ~* "python|curl|java|wget|httpclient|okhttp") {
            return 503;
        }
        # 正常处理
        ...
    }
    ...
变量$http_user_agent是一个可以直接在location中引用的Nginx变量。~*表示不区分大小写的正则匹配,通过python就可以过滤掉80%的Python爬虫。
// 旧版浏览器兼容性支持
require('core-js/fn/array/from')
require('core-js/fn/array/find-index')
require('core-js/fn/array/find')
require('core-js/fn/array/keys')
require('core-js/fn/array/fill')
require('core-js/fn/array/some')
require('core-js/fn/object/assign')
require('core-js/fn/object/values')

https://m.dianping.com/auth/app?ft=5&ssp=true&redir=
https://catdot.dianping.com/broker-service/api/js

var _err = window.onerror;
var url = location.protocol + '//catdot.dianping.com/broker-service/api/js';
window.onerror = function(err, file, line, col, error){
  var e = encodeURIComponent;
  var time = Date.now();
  (new window.Image).src = url
    + '?error=' + e(err)
    + '&v=1'
    + '&data=' + e(error && error.stack ? error.stack : '')
    + '&url=' + e(location.href)
    + '&file=' + e(file)
    + '&line=' + e(line)
    + '&col=' + e(col)
    + '&timestamp=' + time;
  _err && _err(err, file, line, col, error);
};

way1:
 entrypoint:
    - celery
    - -A
    - cmdb_api
    - beat
    - -S
    - django_celery_beat.schedulers:DatabaseScheduler
    - -l
    - info
    links:
    - redis:redis

    way2:
       command:
      - python
      - manage.py
      - runserver
      - 0.0.0.0:8000

      C:\Users\Administrator\AppData\Roaming\npm;C:\Users\Administrator\AppData\Roaming\nvm;d:\Program Files\nodejs;C:\Python27;C:\ProgramData\Administrator\atom\bin
      https 必须是域名不能是ip加端口号
      实际是py3 用的Python27 目录 py3 安装目录覆盖了py2 版本
      C:\Users\Administrator\AppData\Roaming\npm;C:\Users\Administrator\AppData\Roaming\nvm;d:\Program Files\nodejs;C:\Python27;C:\ProgramData\Administrator\atom\bin
C:\Users\Administrator\AppData\Roaming\npm;C:\Users\Administrator\AppData\Roaming\nvm;d:\Program Files\nodejs;C:\Python27;C:\ProgramData\Administrator\atom\bin

C:\Users\Administrator>pip install requests
Collecting requests
  Downloading requests-2.18.4-py2.py3-none-any.whl (88kB)
    41% |█████████████▎                  | 36kB 57kB/s eta 0:00:01
    46% |██████████████▊                 | 40kB 63kB/s eta 0:00:0
    50% |████████████████▎               | 45kB 63kB/s eta 0:00
    55% |█████████████████▊              | 49kB 65kB/s eta 0:0
    60% |███████████████████▏            | 53kB 71kB/s eta 0
    64% |████████████████████▊           | 57kB 75kB/s eta
    69% |██████████████████████▏         | 61kB 83kB/s et
    73% |███████████████████████▋        | 65kB 87kB/s e
    78% |█████████████████████████▏      | 69kB 103kB/
    83% |██████████████████████████▋     | 73kB 109kB
    87% |████████████████████████████    | 77kB 1.6M
    92% |█████████████████████████████▌  | 81kB 1.
    96% |███████████████████████████████ | 86kB 1
    100% |████████████████████████████████| 90kB
 130kB/s
Collecting urllib3<1.23,>=1.21.1 (from requests)
  Downloading urllib3-1.22-py2.py3-none-any.whl (132kB)
    40% |████████████▉                   | 53kB 147kB/s eta 0:00:01
    43% |█████████████▉                  | 57kB 152kB/s eta 0:00:0
    46% |██████████████▉                 | 61kB 163kB/s eta 0:00:
    49% |███████████████▉                | 65kB 169kB/s eta 0:00
    52% |████████████████▉               | 69kB 138kB/s eta 0:0
    55% |█████████████████▉              | 73kB 142kB/s eta 0:
    58% |██████████████████▉             | 77kB 178kB/s eta 0
    61% |███████████████████▉            | 81kB 178kB/s eta
    65% |████████████████████▉           | 86kB 136kB/s eta
    68% |█████████████████████▉          | 90kB 135kB/s et
    71% |██████████████████████▉         | 94kB 232kB/s e
    74% |███████████████████████▊        | 98kB 231kB/s
    77% |████████████████████████▊       | 102kB 174kB/
    80% |█████████████████████████▊      | 106kB 172kB
    83% |██████████████████████████▊     | 110kB 249k
    86% |███████████████████████████▊    | 114kB 248
    89% |████████████████████████████▊   | 118kB 18
    92% |█████████████████████████████▊  | 122kB 1
    95% |██████████████████████████████▊ | 126kB
    99% |███████████████████████████████▊| 131kB
    100% |████████████████████████████████| 135k
B 255kB/s
Collecting chardet<3.1.0,>=3.0.2 (from requests)
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133kB)
    43% |█████████████▊                  | 57kB 100kB/s eta 0:00:0
    46% |██████████████▊                 | 61kB 106kB/s eta 0:00:
    49% |███████████████▊                | 65kB 109kB/s eta 0:00
    52% |████████████████▊               | 69kB 83kB/s eta 0:00
    55% |█████████████████▊              | 73kB 84kB/s eta 0:0
    58% |██████████████████▊             | 77kB 267kB/s eta 0
    61% |███████████████████▋            | 81kB 267kB/s eta
    64% |████████████████████▋           | 86kB 215kB/s eta
    67% |█████████████████████▋          | 90kB 213kB/s et
    70% |██████████████████████▋         | 94kB 211kB/s e
    73% |███████████████████████▋        | 98kB 208kB/s
    76% |████████████████████████▋       | 102kB 62kB/s
    79% |█████████████████████████▌      | 106kB 62kB/
    82% |██████████████████████████▌     | 110kB 79kB
    86% |███████████████████████████▌    | 114kB 79k
    89% |████████████████████████████▌   | 118kB 79
    92% |█████████████████████████████▌  | 122kB 7
    95% |██████████████████████████████▌ | 126kB
    98% |███████████████████████████████▌| 131kB
    100% |████████████████████████████████| 135k
B 77kB/s
Collecting certifi>=2017.4.17 (from requests)
  Downloading certifi-2018.1.18-py2.py3-none-any.whl (151kB)
    40% |█████████████                   | 61kB 231kB/s eta 0:00:01
    43% |█████████████▉                  | 65kB 227kB/s eta 0:00:0
    45% |██████████████▊                 | 69kB 104kB/s eta 0:00:
    48% |███████████████▋                | 73kB 104kB/s eta 0:00
    51% |████████████████▍               | 77kB 104kB/s eta 0:0
    54% |█████████████████▎              | 81kB 104kB/s eta 0:
    56% |██████████████████▏             | 86kB 66kB/s eta 0:
    59% |███████████████████             | 90kB 66kB/s eta 0:
    62% |███████████████████▉            | 94kB 90kB/s eta 0
    64% |████████████████████▊           | 98kB 89kB/s eta
    67% |█████████████████████▋          | 102kB 70kB/s et
    70% |██████████████████████▌         | 106kB 70kB/s e
    72% |███████████████████████▍        | 110kB 110kB/s
    75% |████████████████████████▏       | 114kB 108kB/
    78% |█████████████████████████       | 118kB 77kB/s
    81% |██████████████████████████      | 122kB 77kB/
    83% |██████████████████████████▉     | 126kB 131k
    86% |███████████████████████████▋    | 131kB 132
    89% |████████████████████████████▌   | 135kB 78
    91% |█████████████████████████████▍  | 139kB 7
    94% |██████████████████████████████▎ | 143kB
    97% |███████████████████████████████▏| 147kB
    99% |████████████████████████████████| 151kB
    100% |████████████████████████████████| 155k
B 99kB/s
Collecting idna<2.7,>=2.5 (from requests)
  Downloading idna-2.6-py2.py3-none-any.whl (56kB)
    43% |██████████████                  | 24kB 113kB/s eta 0:00:0
    50% |████████████████▎               | 28kB 36kB/s eta 0:00
    58% |██████████████████▋             | 32kB 41kB/s eta 0:
    65% |█████████████████████           | 36kB 47kB/s eta
    72% |███████████████████████▏        | 40kB 52kB/s e
    79% |█████████████████████████▌      | 45kB 53kB/s
    87% |███████████████████████████▉    | 49kB 55kB
    94% |██████████████████████████████▏ | 53kB 2
    100% |████████████████████████████████| 57kB
 25kB/s
Installing collected packages: urllib3, chardet, certifi, idna, requests
Successfully installed certifi-2018.1.18 chardet-3.0.4 idna-2.6 requests-2.18.4
urllib3-1.22
You are using pip version 7.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' comm
and.

C:\Users\Administrator>pip -h

Usage:
  pip <command> [options]

Commands:
  install                     Install packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring
                              environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be
                              used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output.
  --log <path>                Path to a verbose appending log.
  --proxy <proxy>             Specify a proxy in the form
                              [user:passwd@]proxy.server:port.
  --retries <retries>         Maximum number of retries each connection should
                              attempt (default 5 times).
  --timeout <sec>             Set the socket timeout (default 15 seconds).
  --exists-action <action>    Default action when a path already exists:
                              (s)witch, (i)gnore, (w)ipe, (b)ackup.
  --trusted-host <hostname>   Mark this host as trusted, even though it does
                              not have valid or any HTTPS.
  --cert <path>               Path to alternate CA bundle.
  --client-cert <path>        Path to SSL client certificate, a single file
                              containing the private key and the certificate
                              in PEM format.
  --cache-dir <dir>           Store the cache data in <dir>.
  --no-cache-dir              Disable the cache.
  --disable-pip-version-check
                              Don't periodically check PyPI to determine
                              whether a new version of pip is available for
                              download. Implied with --no-index.

C:\Users\Administrator>

E:\jackieli\python\python爬虫\python3-crawler\baike_spider>python spider_main.py

Traceback (most recent call last):
  File "spider_main.py", line 2, in <module>
    from baike_spider import url_manager, html_downloader, html_parser, html_out
puter
ImportError: No module named 'baike_spider'

E:\jackieli\python\python爬虫\python3-crawler\baike_spider>python spider_main.py

Traceback (most recent call last):
  File "spider_main.py", line 2, in <module>
    from baike_spider import url_manager, html_downloader, html_parser, html_out
puter
ImportError: No module named 'baike_spider'

E:\jackieli\python\python爬虫\python3-crawler\baike_spider>python spider_main.py

Traceback (most recent call last):
  File "spider_main.py", line 2, in <module>
    import url_manager, html_downloader, html_parser, html_outputer
  File "E:\jackieli\python\python爬虫\python3-crawler\baike_spider\html_parser.p
y", line 1, in <module>
    from bs4 import BeautifulSoup
ImportError: No module named 'bs4'

E:\jackieli\python\python爬虫\python3-crawler\baike_spider>pip install bs4
Collecting bs4
  Downloading bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
  Downloading beautifulsoup4-4.6.0-py3-none-any.whl (86kB)
    42% |█████████████▋                  | 36kB 60kB/s eta 0:00:01
    47% |███████████████                 | 40kB 67kB/s eta 0:00:0
    51% |████████████████▋               | 45kB 72kB/s eta 0:00
    56% |██████████████████▏             | 49kB 78kB/s eta 0:
    61% |███████████████████▋            | 53kB 100kB/s eta
    66% |█████████████████████▏          | 57kB 100kB/s et
    70% |██████████████████████▋         | 61kB 113kB/s e
    75% |████████████████████████▏       | 65kB 122kB/s
    80% |█████████████████████████▊      | 69kB 120kB/
    84% |███████████████████████████▏    | 73kB 129k
    89% |████████████████████████████▊   | 77kB 243
    94% |██████████████████████████████▏ | 81kB 2
    99% |███████████████████████████████▊| 86kB
    100% |████████████████████████████████| 90kB
 193kB/s
Installing collected packages: beautifulsoup4, bs4
  Running setup.py install for bs4
Successfully installed beautifulsoup4-4.6.0 bs4-0.0.1
You are using pip version 7.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' comm
and.

E:\jackieli\python\python爬虫\python3-crawler\baike_spider>python spider_main.py

craw 1 : http://baike.baidu.com/item/Python
craw 2 : http://baike.baidu.com/view/10812319.htm

E:\jackieli\python\python爬虫\python3-crawler\baike_spider>