xiyoulaoyuanjia / blog

记录与总结 and else?
5 stars 2 forks source link

建立讨论区 #1

Open xiyoulaoyuanjia opened 11 years ago

xiyoulaoyuanjia commented 11 years ago

为什么输出换行都会消耗很多时间?

我们知道对于一些语言是行缓冲的 当输出中有 "\n" 时发发生与io之间的交互 当然会消耗更多的时间了。

xiyoulaoyuanjia commented 11 years ago

默认情况下stdout是行缓冲的,没有换行的话,程序会尽量缓存,直到某个阈值(比如1024)为止。缓存减少了与IO设备的交互,所以显然速度会更快。如果你在运行的时候把输出重定向文件,估计就看不出差别了。

xiyoulaoyuanjia commented 11 years ago

Sticky位? Linux系统的/tmp目录是已经被设置了Sticky位的例子。因为该目录是多个用户共享的目录,每个用户都对其有写权限。为保证每个用户只能对自己的文件有修改、删除权限,所以设置了Sticky位。

xiyoulaoyuanjia commented 9 years ago
#!/bin/bash
""""/bin/true
export T=111

##do_shell_thing

exec python $0 $*

exit "1"""

import os
print(os.environ['T'])
## do_python_thing
}

python 与 shell 混合脚本,一般在python 启动的时候需要加载一些环境变量的时候这个很好用,这个理很高效,竟然不知道这里应该高亮什么了..囧...

xiyoulaoyuanjia commented 9 years ago

关于时区


   date 

一般结果如下所示:

Wed Oct 28 15:14:44 CST 2015

其中 CST 表示 China Standard Time 

xiyoulaoyuanjia commented 9 years ago
#!/usr/bin/python
#-*-coding:utf-8-*-

import random
from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware

class RotateUserAgentMiddleware(UserAgentMiddleware):
    """
        a useragent middleware which rotate the user agent when crawl websites

        if you set the USER_AGENT_LIST in settings,the rotate with it,if not,then use the default user_agent_list attribute instead.
    """

    #the default user_agent_list composes chrome,I E,firefox,Mozilla,opera,netscape
    #for more user agent strings,you can find it in http://www.useragentstring.com/pages/useragentstring.php
    user_agent_list = [\
        'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.43 Safari/537.31',\
        'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.60 Safari/537.17',\
        'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1309.0 Safari/537.17',\
        \
        'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.2; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)',\
        'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)',\
        'Mozilla/5.0 (Windows; U; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727)',\
        \
        'Mozilla/6.0 (Windows NT 6.2; WOW64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1',\
        'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1',\
        'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:15.0) Gecko/20120910144328 Firefox/15.0.2',\
        \
        'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201',\
        'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a3pre) Gecko/20070330',\
        'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13; ) Gecko/20101203',\
        \
        'Opera/9.80 (Windows NT 6.0) Presto/2.12.388 Version/12.14',\
        'Opera/9.80 (X11; Linux x86_64; U; fr) Presto/2.9.168 Version/11.50',\
        'Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; de) Presto/2.9.168 Version/11.52',\
        \
        'Mozilla/5.0 (Windows; U; Win 9x 4.90; SG; rv:1.9.2.4) Gecko/20101104 Netscape/9.1.0285',\
        'Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1.7pre) Gecko/20070815 Firefox/2.0.0.6 Navigator/9.0b3',\
        'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080219 Firefox/2.0.0.12 Navigator/9.0.0.6',\
    ]

    def __init__(self, user_agent=''):
        self.user_agent = user_agent

    def _user_agent(self, spider):
        if hasattr(spider, 'user_agent'):
            return spider.user_agent
        elif self.user_agent:
            return self.user_agent

        return random.choice(self.user_agent_list)

    def process_request(self, request, spider):
        ua = self._user_agent(spider)
        if ua:
            request.headers.setdefault('User-Agent', ua)
  1. 可以爬取google cache里面的
  2. 更改ua变换来突破禁止 
xiyoulaoyuanjia commented 9 years ago

关于下方amazon kindlegen的方法,通过官方下载路径进去,由于某些原因不能正常下载,今天发现可以通过帮助页面下载,不知道这个算什么哈...囧.. 这里

xiyoulaoyuanjia commented 9 years ago

今天发现一些python代码在命名一些类时喜欢在名称中加入Mixin,不是特别明白,stackoverflow中有一段这样的解释

xiyoulaoyuanjia commented 8 years ago

Python中既然可以直接通过父类名调用父类方法为什么还会存在super函数? 知乎作者解释的很详细了,尤其是后面的例子..赞一个. 参考1 参考2