scalingexcellence / scrapybook

Scrapy Book Code
http://scrapybook.com/
475 stars 209 forks source link

I met a problem at the time of pipelines, MySQL error OperationalError: (1241, 'Operand should contain 1 column (s) ') #6

Closed xiaowenjie21 closed 8 years ago

xiaowenjie21 commented 8 years ago

Here is the code::pipelines.py `def process_item(self, item, spider): db = MySQLdb.connect("localhost","root","2955112","properties",charset="utf8") cursor = db.cursor()

sql = """CREATE TABLE IF NOT EXISTS baidu_data2 (

             #title  VARCHAR(250),
             #content  varchar(550),
             #contenthref varchar(450)
              #) ENGINE=MyISAM DEFAULT CHARSET=utf8 """
    #cursor.execute(sql)        
    sql2 = """INSERT INTO baidu_data2 (title,content,contenthref) VALUES (%s,%s,%s)"""

    args = (item["title"],item["content"],item["contenthref"])   
    cursor.execute(sql2,args)`

Here is the code:spiders `def parse_item(self,response):

selec=response.xpath("")

    #l= ItemLoader(item=SosoItem(), response=response)
    #l.add_xpath('title',"//h3[@class='vrTitle']/a")
    #l.add_xpath('content',"//div[@class='str_info_div']/p")
    #l.add_xpath('contenthref',"//h3[@class='vrTitle']/a/@href")

    #return l.load_item()
    items=[]
    select=response.xpath("//div[@class='vrwrap']")
    for i in select:
        item = SosoItem()
        item["title"]=i.xpath("h3[@class='vrTitle']/a").extract()
        item["content"]=i.xpath("div[1]/p").extract()
        item['contenthref']=i.xpath("h3[@class='vrTitle']/a/@href").extract()
        items.append(item)

    return items    `

ERROR information

ERROR: Error processing {'content': [u'

\nABC\u7ae5\u978b\u54c1\u724c\u4e3a\u5168\u56fd\u7684\u7ae5\u978b\u52a0\u76df\u4ee3\u7406\u6279\u53d1\u5546\u63d0\u4f9bABC\u7ae5\u978b2015\u65b0\u6b3e\uff0cABC\u7ae5\u978b\u54c1\u724c\u52a0\u76df\u8d39\uff0cABC\u7ae5\u978b\u4e13\u5356\u5e97\u52a0\u76df\u6761\u4ef6\uff0cABC\u7ae5\u978b\u52a0\u76df\u7535\u8bdd\u7b49\u4fe1\u606f\uff0c\u54a8\u8be2\u70ed\u7ebf\uff1a400\u20146929\u2014...

'], 'contenthref': [u'http://www.sogou.com/link?url=DSOYnZeCC_rNvR6aXaV4WJFzyG5FAO1LDz_NR33A47Q.&query=abc'], 'title': [u'\u7ae5\u978b\u52a0\u76df', u'\u7ae5\u978b\u52a0\u76df\u4ee3\u7406', u'\u7ae5\u978b\u5b98\u7f51 -\u4e2d\u56fd\u978b\u7f51']} Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks current.result = callback(current.result, _args, *_kw) File "/usr/soso/soso/pipelines.py", line 28, in process_item cursor.execute(sql2,args) File "/usr/local/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 205, in execute self.errorhandler(self, exc, value) File "/usr/local/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue OperationalError: (1241, 'Operand should contain 1 column(s)')

I think it is a problem with the above code, but I don't know how to solve?

lookfwd commented 8 years ago

Hey, @xiaowenjie21, is this related to the book or is it a general Stack Overflow-type question?

From a quick look, replacing extract() with .extract_first('N/A') should fix the problem. item["title"] is an array with potentially more than one values. extract_first() (see here) takes just the first element.