search5 / solrpy

Automatically exported from code.google.com/p/solrpy
Other
40 stars 17 forks source link

SolrConnection can post mal-formed XML #9

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
It's easy to create mal-formed XML posts to Solr, and difficult to create
an efficient (single-POST) multi-document add or delete:

  conn = solr.SolrConnection('http://solr.example.net/')
  conn.begin_batch()
  conn.delete_many(['one', 'two'])
  conn.end_batch(commit=True)

will cause this XML to be POSTed:

  <delete><id>one</id></delete><delete><id>two</id></delete><commit/>

This should produce two POSTs:

  <delete><id>one</id><id>two</id></delete>

and:

  <commit/>

I'm using solrpy 0.5 (installed from PyPI using zc.buildout).

Original issue reported on code.google.com by fdrake on 6 May 2009 at 5:47

GoogleCodeExporter commented 9 years ago
Can you post an example of the creation of malformed XML created by solrpy?

Original comment by benliles on 25 Aug 2009 at 4:15

GoogleCodeExporter commented 9 years ago
My original report contains the exact series of calls and the resulting XML 
document.

Here's an interactive session showing exactly what get's passed into
SolrConnection._post():

>>> import pdb
>>> import solr
>>> 
>>> c = solr.SolrConnection("http://solr.example.net/solr")
>>> c.begin_batch()
1
>>> c.delete_many(["one", "two"])
>>> c.end_batch(True)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "solr/core.py", line 546, in end_batch
    return self._update("".join(self.__batch_queue))
  File "solr/core.py", line 663, in _update
    request, self.xmlheaders)
  File "solr/core.py", line 735, in _post
    self._reconnect()
  File "solr/core.py", line 716, in _reconnect
    self.conn.connect()
  File "/Users/fdrake/local/python-2.4.5/lib/python2.4/httplib.py", line 614, in connect
    socket.SOCK_STREAM):
socket.gaierror: (8, 'nodename nor servname provided, or not known')
>>> 
>>> pdb.pm()
> /Users/fdrake/local/python-2.4.5/lib/python2.4/httplib.py(614)connect()
-> socket.SOCK_STREAM):
(Pdb) u
> /Users/fdrake/projects/solrpy/build/lib/solr/core.py(716)_reconnect()
-> self.conn.connect()
(Pdb) u
> /Users/fdrake/projects/solrpy/build/lib/solr/core.py(735)_post()
-> self._reconnect()
(Pdb) p url
'/solr/update'
(Pdb) p body
u'<delete><id>one</id></delete><delete><id>two</id></delete><commit/>'
(Pdb) p headers
{'Content-Type': 'text/xml; charset=utf-8'}

Note that the XML shown is not well-formed:

>>> import xml.parsers.expat
>>> 
>>> p = xml.parsers.expat.ParserCreate()
>>> 
p.Parse('<delete><id>one</id></delete><delete><id>two</id></delete><commit/>')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
xml.parsers.expat.ExpatError: junk after document element: line 1, column 29

Original comment by fdrake on 28 Aug 2009 at 9:57

GoogleCodeExporter commented 9 years ago
For the record, that interactive session is with the trunk of solrpy.

Original comment by fdrake on 28 Aug 2009 at 9:59

GoogleCodeExporter commented 9 years ago
Looks like there has been more discussion on this relating to removing the batch
updates since they no longer work with newer versions of Solr.

Mergin with Issue 13 to reflect this change in status.

Original comment by benliles on 14 Sep 2009 at 5:17