peterknife / boto

Automatically exported from code.google.com/p/boto
0 stars 0 forks source link

Unexpected Key behavior #380

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The boto.Key objects' public version_id does not dictate the version of the
key to be downloaded. The version_id is required as a parameter to get_*
functions. Consider the following code:

# Assume there are 5 versions

for k in b.get_all_versions():
    print k.get_contents_as_string(), k.version_id

for k in b.get_all_versions():
    print k.get_contents_as_string(version_id=k.version_id), k.version_id

The first for loop will print the same contents and same version 5 times in
a row. The second will actually print the individual versions. It seems
that version_id is set upon the file download, but its presence doesn't
dictate operation of the class. This is inconsistent with setting k.key to
the desired S3 key to download the contents. If k.key says something about
what k represents and how it operates as an object, then k.version_id
should as well.

It looks like this problem can be solved with setting the default value for
version_id in the get_* functions equal to self.version_id. They currently
default to None, ignoring the value of the class. Was this a design
decision or an oversight?

Original issue reported on code.google.com by brimcfad...@gmail.com on 3 Jun 2010 at 3:12

GoogleCodeExporter commented 9 years ago
In the same vein, I feel that the reduced_redundancy parameter should work the 
same
way, though in its case, it's not even a class object. Why not have version_id 
and
reduced_redundancy represent the object?

I realize that there are some other problems that arise from changes like 
these. If
you change k.key, you need to remember to change k.version_id and
k.reduced_redundancy. I just don't feel like the current setup is consistent; 
even
requiring the key as a parameter would be better for consistency's sake (and 
avoids
breaking compatibility with users of legacy versions of boto that have code that
adjusts k.key, something that would occur if these atomic sets of data changes
occurred using functions).

Original comment by brimcfad...@gmail.com on 3 Jun 2010 at 7:27

GoogleCodeExporter commented 9 years ago
I believe r1604 fixes the original issue described.  Could you verify when you 
get a chance?

Original comment by Mitch.Ga...@gmail.com on 19 Jun 2010 at 12:16

GoogleCodeExporter commented 9 years ago
Having just hit this issue independently, no, it does not.

bucket.get_all_versions and bucket.list_versions work as expected.  r1604 may 
have fixed them, but did not resolve this issue.

Consider a key "k" with three versions, like so:

version 1: version_id="v1", contents="c1"
version 2: version_id="v2", contents="c2"
version 3: version_id="v3", contents="c3"

We can then extract three versioned keys:

k3, k2, k1 = b.get_all_versions(prefix="k")
print k1.version_id # "v1"
print k2.version_id # "v2"
print k3.version_id # "v3"
print k3.get_contents_as_string() # "c3"
print k2.get_contents_as_string(version_id=k2.version_id) # "c2"
print k1.get_contents_as_string(version_id=k1.version_id) # "c1"
print k2.get_contents_as_string(version_id=k1.version_id) # "c1"

So far, so good, but here's where things start to go wrong:

print k2.version_id # "v1"
print k1.get_contents_as_string() # "c3"
print k1.version_id # "v3"

There are two errors happening in .get_contents_*:

1. If no version is specified, .version_id should be used to give the correct 
version's contents.
2. Retrieving the contents of a specific version should not set .version_id to 
that version.

Combined, this means that naïvely attempting to retrieve the contents of any 
versioned key will retrieve the most recent version, and will silently 
transform the versioned key into that version.

Original comment by funnyman3595 on 17 Sep 2010 at 9:33

GoogleCodeExporter commented 9 years ago
Thanks for the details analysis.  I completely agree on #1 and that's easy to 
fix.  I think you are right on #2 as well but the semantics are less clear.  It 
leads me to suspect that the current model for versions in boto is not quite 
right but that's a bigger topic and a much more complicated fix.  For now, I 
think it makes sense to ensure that if a particular Key object already has a 
non-None value for version_id, that get_contents_* should not overwrite that 
value.

Fix should be committed today.

Original comment by Mitch.Ga...@gmail.com on 17 Sep 2010 at 4:18

GoogleCodeExporter commented 9 years ago
What about if you did k1.set_contents_from_string('c4')?  What should 
k1.version_id be now?

Original comment by Mitch.Ga...@gmail.com on 17 Sep 2010 at 4:44

GoogleCodeExporter commented 9 years ago
.set_contents_from_string should fail if .version_id is not current.  If it is 
current, it should update to the new version.

Original comment by funnyman3595 on 20 Sep 2010 at 9:23

GoogleCodeExporter commented 9 years ago
Determining if the version_id of a key is current or not would require an extra 
round trip to S3, wouldn't it?  Not crazy about that idea although I do agree 
that after calling set_contents_* the key's version_id should be updated to the 
new version.

Original comment by Mitch.Ga...@gmail.com on 20 Sep 2010 at 9:27

GoogleCodeExporter commented 9 years ago
There are two reasons that .version_id would be set:

1. It was explicitly set to request a specific version (including by using 
.get_all_versions).
2. It was implicitly set by boto based on information received.

Assuming 2, you are right that adding an extra trip to S3 is a bad idea, and it 
isn't necessary in any case.

Assuming 1, the operation only makes sense on the current version, so it should 
fail on any other version.

The right solution is probably to separate those two use cases.  The 
implementation that comes to mind would be creating a KeyVersion subclass of 
Key.  Key.version_id would then attempt to reflect the current version, while 
KeyVersion.version_id would be nailed to the specific version given.

This also suggests a third possibility for the 
"k1.set_contents_from_string('c4'); print k1.version_id" question: 
.set_contents_from_string returns a new KeyVersion representing the version 
created, leaving k1's .version_id unchanged.

Naturally, other implementations of the same basic idea exist; a boolean 
.static_version would have the same effect.

Original comment by funnyman3595 on 20 Sep 2010 at 9:55