okfn / ckanclient-deprecated

DEPRECATED - please see https://github.com/ckan/ckanapi. [Python client library for CKAN]
http://pypi.python.org/pypi/ckanclient
25 stars 17 forks source link

Bugfix s3 upload #19

Closed glance- closed 11 years ago

glance- commented 11 years ago

Bugfix and restructure some code so ckanclient can be used to upload data-files to s3.

This is a resubmit of #14, that only got closed and not merged.

hbunke commented 11 years ago

Thanks a lot for this commit. It's generally a good idea to switch to urrlib2. Unfortunately this breaks the filesubmission. urllib2 did not accept the submitted URL. I've submitted a fix in pullreq #20. In the long run _post_multipart should also use APIRequest, so that we don't need two different URL openers.

glance- commented 11 years ago

@hbunke you are running ckanclient against a ckan installation with "ofs.impl = pairtree" ?

The breakage in the switch from httplib to urllib2 has nothing to do with why with why the pairtree impl. broke. pairtree implementation uses relative urls but when you are posting your upload to eg. s3, the url returned from the api call is a absolute url to s3.

Allso, it would be quite missleading and might break things if you would use a APIRequest object instead of a plain request object, due to that the ckan-api and s3 api might have mutualy exclusive parameters and condition. I don't know if s3 upload works if you send a CKAN-Authorization header to s3, that how since 0e32bd1 gets sent.

The change done in 7604c5e will once again break everything for s3 upload.

hbunke commented 11 years ago

Sorry for the confusion. @rgp is right, we need tests for file upload methods. It might also make sense to implement different upload methods for s3 or other cloud storage providers?

rufuspollock commented 11 years ago

@glance- @hbunke can we get a summary of the immediate fixes we need to get both local and s3 working? Also seem essential we get tests (we can just test the header and url info with some refactoring probably)

hbunke commented 11 years ago

As far as I can see there's only one problem left. CKAN seems to return different URLs for local and S3 storage. So the best solutions in this case would be, that the API generally returns fully qualified URLs. I'm not too deep into CKAN code and haven't found the point, where this URL is generated. In controllers/storage.py StorageController.success() defines c.file_url correctly. Somewhere (pairtree?) after that http: and domain get lost and only /storage... is returned.

As a very simple (and probably error-prune) fix for this in ckanclient we could test the returning for 'http' (or/and 'storage') and add 'http://domain/" (self.base_netloc as suggested in #18 ) if necessary.

Authorization for S3 is done in CKAN ini file, so the authorization header sent by the client should not break anything here.

Tests: I've just tried to run ckanclient tests and keep getting database related errors, e.g. debug views with "OperationalError: (OperationalError) no such table: user". CKAN tests are running fine. Any hints on that?

glance- commented 11 years ago

Authorization for S3 is done in a multipart field, encoded in the multipart/form-data. A additional "Authorization"-header might break in more ways then i can imagine.