Open GoogleCodeExporter opened 9 years ago
sorry ignore the connection.py debug statements :)
Original comment by matthewh...@gmail.com
on 21 Apr 2009 at 11:01
Thanks for the patch. The one question I have is in the update method. You are
using the batch put but I don't think there is really any need to do that. A
normal
put_attributes call will work just fine since you are only updating a single
item.
Or am I missing something?
Original comment by Mitch.Ga...@gmail.com
on 22 Apr 2009 at 12:15
Most likely I'm missing something. Yeah, put_attributes would probably work
too :)
On that note I'm going to be have some domains with millions of entries.
What's the
best way to load that data? (Am assumming batching up many (10K or so) items
and
sending them (perhaps many batches split among threads)). If that doesn't
exist,
I'll probably want it within the week ;) So if you have any hints, that'd be
great,
else I'll implement my own and send the patch.
Original comment by matthewh...@gmail.com
on 22 Apr 2009 at 12:34
I just checked with a friend who has done a lot of bulk uploads to SDB. He
confirmed
that the best approach is firing up multiple threads (the right number depends
on
your upload bandwidth, among other things) each performing bulk_put_attributes
commands.
There is still a threaded query method in connection.py but I haven't done a
threaded
uploader. I would probably actually use the subprocess module if you are on a
linux
platform and using python 2.5 or better. I've never been a big fan of threads
but
YMMV. I might have some sample code from another project that would help.
I'll check.
Original comment by Mitch.Ga...@gmail.com
on 22 Apr 2009 at 1:29
batch_put is a bit broken at the moment:
def batch_put_attributes(self, items, replace=True):
return self.connection.put_attributes(self, item_name, attributes, replace)
item_name and attributes are not readily available (should be .keys() and
.values() ?)
Original comment by attila.c...@gmail.com
on 28 Apr 2009 at 3:15
attila is correct
here's my patch:
--- boto/sdb/domain.py (revision 1125)
+++ boto/sdb/domain.py (working copy)
@@ -91,7 +91,7 @@
@rtype: bool
@return: True if successful
"""
- return self.connection.put_attributes(self, item_name, attributes,
replace)
+ return self.connection.batch_put_attributes(self, items, replace)
def get_attributes(self, item_name, attribute_name=None, item=None):
"""
Original comment by matthewh...@gmail.com
on 28 Apr 2009 at 9:52
thanks. Stupid copy/paste error. Hurrying too much. Fixed in r1126.
Original comment by Mitch.Ga...@gmail.com
on 28 Apr 2009 at 10:01
It's still wrong.
You need to invoke .batch_put_attributes
instead of .put_attributes
Original comment by matthewh...@gmail.com
on 28 Apr 2009 at 10:29
good grief.
okay, I slowed down a little and added a simple test to
tests/test_sdbconnection.py
that will at least make sure the domain.batch_put_attributes method is
functional.
Original comment by Mitch.Ga...@gmail.com
on 28 Apr 2009 at 10:44
;)
Now I'm running into the problem where I'm trying to call .batch_put_attributes
and
it gives me a broken pipe. Uploading one at a time works, but is slow for 80K
test
data set....
Original comment by matthewh...@gmail.com
on 28 Apr 2009 at 10:46
Figured out my problem. There is a limit of 25 items per batch_put_attributes
http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/DeveloperGuide/index
.html?SDB_API_BatchPutAttributes.html
Sometimes I got 'Broken pipe' error, then I lowered the number of items I got
and
Amazon told be I was trying to insert too many.
Here's a patch to help client developers figure this out faster:
+++ boto/sdb/connection.py (working copy)
@@ -261,6 +261,9 @@
@rtype: bool
@return: True if successful
"""
+ if len(items) > 25:
+ raise boto.BotoClientError("Can only insert 25 items in
BatchPutAttributes (trying to insert %d)" %len(items))
+
domain, domain_name = self.get_domain_and_name(domain_or_name)
params = {'DomainName' : domain_name}
self.build_batch_list(params, items, replace)
Original comment by matthewh...@gmail.com
on 28 Apr 2009 at 11:04
Hmmmm, can anyone else confirm that batch_put_attributes works?
I can call it and it appears to return successfully yet my domain reports that
it is
empty....
BTW here's a better version of the last patch
--- boto/sdb/connection.py (revision 1128)
+++ boto/sdb/connection.py (working copy)
@@ -30,6 +30,9 @@
from boto.exception import SDBResponseError
from boto.resultset import ResultSet
+MAX_PUT_ATTRIBUTES = 256
+MAX_BATCH_PUT_ITEMS = 25
+
class ItemThread(threading.Thread):
def __init__(self, name, domain_name, item_names):
@@ -233,6 +236,8 @@
@rtype: bool
@return: True if successful
"""
+ if len(attributes) > MAX_PUT_ATTRIBUTES:
+ raise boto.BotoClientError("Can only insert %d attributes in
PutAttributes (trying to insert %d)" % (MAX_PUT_ATTRIBUTES, len(attributes)))
domain, domain_name = self.get_domain_and_name(domain_or_name)
params = {'DomainName' : domain_name,
'ItemName' : item_name}
@@ -261,6 +266,9 @@
@rtype: bool
@return: True if successful
"""
+ if len(items) > MAX_BATCH_PUT_ITEMS:
+ raise boto.BotoClientError("Can only insert %d items in
BatchPutAttributes (trying to insert %d)" %(MAX_BATCH_PUT_ITEMS, len(items)))
+
domain, domain_name = self.get_domain_and_name(domain_or_name)
params = {'DomainName' : domain_name}
self.build_batch_list(params, items, replace)
Original comment by matthewh...@gmail.com
on 29 Apr 2009 at 2:24
Hmm. Seems to be working for me.
In [1]: import boto
In [2]: c = boto.connect_sdb()
In [3]: d = c.lookup('my_domain')
send: 'GET
/?AWSAccessKeyId=0CZQCKRS3J69PZ6QQQR2&Action=Query&DomainName=my_domain&MaxNumbe
rOfItems=1&QueryExpression=&SignatureMethod=HmacSHA256&SignatureVersion=2&Timest
amp=2009-04-29T02%3A31%3A36&Version=2007-11-07&Signature=uI9sYKluAC3h0raZpqiaSm5
ZILb4tp9Ht/fIOPV%2B20E%3D
HTTP/1.1\r\nHost: sdb.amazonaws.com:443\r\nAccept-Encoding:
identity\r\nUser-Agent:
Boto/1.7a (darwin)\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: text/xml
header: Transfer-Encoding: chunked
header: Date: Wed, 29 Apr 2009 02:31:36 GMT
header: Server: Amazon SimpleDB
In [4]: item3 = {'name3_1' : 'value3_1',
...: 'name3_2' : 'value3_2',
...: 'name3_3' : ['value3_3_1', 'value3_3_2']}
In [5]: item4 = {'name4_1' : 'value4_1',
...: 'name4_2' : ['value4_2_1', 'value4_2_2'],
...: 'name4_3' : 'value4_3'}
In [6]: items = {'item3' : item3, 'item4' : item4}
In [7]: d.batch_put_attributes(items)
send: 'GET
/?AWSAccessKeyId=0CZQCKRS3J69PZ6QQQR2&Action=BatchPutAttributes&DomainName=my_do
main&Item.0.Attribute.0.Name=name3_2&Item.0.Attribute.0.Replace=true&Item.0.Attr
ibute.0.Value=value3_2&Item.0.Attribute.1.Name=name3_3&Item.0.Attribute.1.Replac
e=true&Item.0.Attribute.1.Value=value3_3_1&Item.0.Attribute.2.Name=name3_3&Item.
0.Attribute.2.Replace=true&Item.0.Attribute.2.Value=value3_3_2&Item.0.Attribute.
3.Name=name3_1&Item.0.Attribute.3.Replace=true&Item.0.Attribute.3.Value=value3_1
&Item.0.ItemName=item3&Item.1.Attribute.0.Name=name4_1&Item.1.Attribute.0.Replac
e=true&Item.1.Attribute.0.Value=value4_1&Item.1.Attribute.1.Name=name4_3&Item.1.
Attribute.1.Replace=true&Item.1.Attribute.1.Value=value4_3&Item.1.Attribute.2.Na
me=name4_2&Item.1.Attribute.2.Replace=true&Item.1.Attribute.2.Value=value4_2_1&I
tem.1.Attribute.3.Name=name4_2&Item.1.Attribute.3.Replace=true&Item.1.Attribute.
3.Value=value4_2_2&Item.1.ItemName=item4&SignatureMethod=HmacSHA256&SignatureVer
sion=2&Timestamp=2009-04-29T02%3A33%3A32&Version=2007-11-07&Signature=OJDB41JOo%
2BLClB21Gf7sRcdQf50UBKhTsZhHPZTCYaI%3D
HTTP/1.1\r\nHost: sdb.amazonaws.com:443\r\nAccept-Encoding:
identity\r\nUser-Agent:
Boto/1.7a (darwin)\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: text/xml
header: Transfer-Encoding: chunked
header: Date: Wed, 29 Apr 2009 02:33:31 GMT
header: Server: Amazon SimpleDB
Out[7]: True
Original comment by Mitch.Ga...@gmail.com
on 29 Apr 2009 at 2:35
hmmmm, can you try running my attached testcase? It appears to run. But when I
query the domain (the testcase does this), it says it's empty....
Original comment by matthewh...@gmail.com
on 29 Apr 2009 at 3:07
Attachments:
Hi Mitch,
How did you set your ipython to display what has been sent? Sorry not very boto
specific comment.
Thanks
Original comment by norman.k...@gmail.com
on 6 May 2009 at 9:42
The Item.update() method was introduced here and it's "inventor" wanted "to be
able
to use the dict method update() to update item attributes".
Currently, it's implemented as follows (boto 1.8.d):
def update(self, other_dict):
if self._dict == None:
self.load()
if self.active:
self.domain.put_attributes(self.name, self, replace)
self._dict.update(other_dict)
Hence, when an Item is update()ted with an other_dict, this change is not done
remotely, even in active=True state. I don't see the sense of
`self.domain.put_attributes(self.name, self, replace)` (which could be
shortened by
self.save()) *before* the actual dictionary updating. So, I would suggest to do
it
this way:
def update(self, other_dict):
if self._dict == None:
self.load()
self._dict.update(other_dict)
if self.active:
self.save()
Does this make sense / did I oversee something?
Thank you,
Jan-Philip Gehrcke
Original comment by jgehr...@googlemail.com
on 21 Jul 2009 at 8:50
Original issue reported on code.google.com by
matthewh...@gmail.com
on 21 Apr 2009 at 11:00