wimleers / fileconveyor

File Conveyor is a daemon written in Python to detect, process and sync files. In particular, it's designed to sync files to CDNs. Amazon S3 and Rackspace Cloud Files, as well as any Origin Pull or (S)FTP Push CDN, are supported. Originally written for my bachelor thesis at Hasselt University in Belgium.
https://wimleers.com/fileconveyor
The Unlicense
340 stars 95 forks source link

Rackspace Cloudfiles UK configuration #128

Open jamestombs opened 11 years ago

jamestombs commented 11 years ago

I have set up everything that I think needs to be set and arbitrator.py runs and seems to successfully connect but fails to transfer files. I get the following for every file:

2012-11-05 08:39:03,387 - Arbitrator.Transporter    - ERROR    - The transporter 'Cloud Files' has failed while transporting the file '/var/www/themes/garland/images/bg-tab.png' (action: 1). Error: ''.
2012-11-05 08:39:04,352 - Arbitrator.Transporter    - ERROR    - The transporter 'Cloud Files' has failed while transporting the file '/var/www/themes/garland/images/bg-content-left.png' (action: 1). Error: ''.

Are there issues with using Cloudfiles?

chrisivens commented 11 years ago

I think that it needs a bit more debugging info to know what's going on. Could you turn the callback debugging on in the settings.py file and paste any relevant info here or as a gist?

jamestombs commented 11 years ago

Just have a lot of these in the log:

2012-11-05 09:08:05,557 - Arbitrator                - INFO     - Pipeline queue -> filter queue: '/var/www/newsletters/creator/js/fckeditor/editor/dialog/common/fck_dialog_common.css'.
2012-11-05 09:08:05,561 - Arbitrator                - INFO     - Pipeline queue -> filter queue: '/var/www/newsletters/creator/js/fckeditor/editor/dialog/common/fck_dialog_common.js'.

Running arbitrator.py now just stays at 'fully running' state but has 58000 items in the 'pipeline' persistent queue.

2012-11-05 09:08:27,056 - Arbitrator                - INFO     - Final sync of discover queue to pipeline queue made.
2012-11-05 09:08:27,056 - Arbitrator                - WARNING  - 'pipeline' persistent queue contains 58795 items.
2012-11-05 09:08:27,056 - Arbitrator                - WARNING  - 'files_in_pipeline' persistent list contains 0 items.
2012-11-05 09:08:27,057 - Arbitrator                - WARNING  - 'failed_files' persistent list contains 0 items.
2012-11-05 09:08:27,057 - Arbitrator                - WARNING  - 'files_to_delete' persistent list contains 0 items.
2012-11-05 09:08:27,057 - Arbitrator                - WARNING  - synced files DB contains metadata for 0 synced files.
2012-11-05 09:08:27,057 - Arbitrator                - INFO     - Cleaned up the working directory '/tmp/fileconveyor'.
2012-11-05 09:08:27,057 - Arbitrator                - WARNING  - File Conveyor has shut down.

Is there any easy way to clear the 'cache' without causing issues to start the sync again from scratch?

chrisivens commented 11 years ago

Just to be clear, you are using the code from this (wimleers) repo of fileconveyor and not my one? There may be bugs in my one.

Could you set these options in settings.py and restart the process. You should have a lot more in your log file then.

CALLBACKS_CONSOLE_OUTPUT = True
CONSOLE_LOGGER_LEVEL = logging.DEBUG
FILE_LOGGER_LEVEL = logging.DEBUG
jamestombs commented 11 years ago

Pretty sure it is Wim's, I used the PIP installation in the instructions.

Sample sections of log:

2012-11-05 09:20:10,689 - Arbitrator                - DEBUG    - Process queue: started the '' processor chain for the file '/var/www/sites/default/files/start-finish-0285.jpg'.
2012-11-05 09:20:10,690 - Arbitrator                - INFO     - Process queue -> transport queue: '/var/www/sites/default/files/ben_swift.jpg'.
2012-11-05 09:20:10,690 - Arbitrator                - DEBUG    - Process queue: started the '' processor chain for the file '/var/www/sites/default/files/ben_swift.jpg'.
2012-11-05 09:20:10,690 - Arbitrator.Transporter    - ERROR    - The transporter 'Cloud Files' has failed while transporting the file '/var/www/sites/default/files/plod-3.jpg' (action: 1). Error: ''.

2012-11-05 09:20:17,134 - Arbitrator                - INFO     - Transport queue: '/var/www/sites/default/files/e2e-profile_0.jpg' to transfer to server 'cloudfiles' with transporter #3 (of 10), place 2 in the queue.
2012-11-05 09:20:17,134 - Arbitrator                - INFO     - Transport queue: '/var/www/sites/default/files/end2end_-_d9-177.jpg' to transfer to server 'cloudfiles' with transporter #4 (of 10), place 1 in the queue.
2012-11-05 09:20:17,134 - Arbitrator                - INFO     - Transport queue: '/var/www/sites/default/files/champsmotor700x280-2.png' to transfer to server 'cloudfiles' with transporter #4 (of 10), place 2 in the queue.

2012-11-05 09:20:47,210 - Arbitrator.Transporter    - DEBUG    - Running the transporter 'Cloud Files' to sync '/var/www/sites/default/files/conversation-a.jpg'.

2012-11-05 09:20:47,711 - Arbitrator.Transporter    - ERROR    - The transporter 'Cloud Files' has failed while transporting the file '/var/www/sites/default/files/conversation-a.jpg' (action: 1). Error: ''.
chrisivens commented 11 years ago

Have you done the nasty little hack to get it to use UK cloudfiles address? I intend to get that fixed very soon and incorporated into this repo. I'll try and get the link to my commit which had the changes in it.

jamestombs commented 11 years ago

I made this change to django_settings.py if that is what you are referring to:

CUMULUS = {
  'AUTH_URL' : 'https://lon.auth.api.rackspacecloud.com'
}

Have had the same result with and without it.

chrisivens commented 11 years ago

Yeah, it's an ugly little hack that I had to add for my setup for speed's sake. Commit was here: chrisivens/fileconveyor@b4bd2f5

EDIT: I think I've done something different on my one to use cloudfiles and UK url. Bear with me, I'll just take a look.

chrisivens commented 11 years ago

Line 35 on TransporterCumulus class (transporters/transporter_cumulus.py) I have altered the storage path thus:

self.storage.authurl = uk_authurl

So the try block should look like this:

self.storage = CloudFilesStorage(
            self.settings["username"],
            self.settings["api_key"],
            self.settings["container"],
            )
            self.storage.authurl = uk_authurl
jamestombs commented 11 years ago

I've made those changes to the transporter_cloudfiles.py file, what should be in the django_settings.py?

I assume the uk_authurl needs to be defined somewhere?

chrisivens commented 11 years ago

I'm writing this on my phone and getting the right links to paste is proving tricky.

I edited my django_settings.py to be something like this:

SECRET_KEY='change me for something better'
# Dummy settings for `django-storages`.
MEDIA_URL=''
MEDIA_ROOT=''
# `backends/ftp.py`
FTP_STORAGE_LOCATION=''
# `backends/sftp.py`
SFTP_STORAGE_HOST=''
# django-cumulus
#CUMULUS['USERNAME'] = '';
#CUMULUS['CUMULUS_API_KEY'] = '';
#CUMULUS['CONTAINER'] = '';
CUMULUS_API_KEY = '';
CUMULUS_USERNAME = '';
CUMULUS_CONTAINER = '';
CUMULUS = {
  'AUTH_URL' : 'uk_authurl'
}

It's on my no-delete branch in commit chrisivens/fileconveyor@5c8bbaae

I'll fix that link in a bit if it's wrong.

jamestombs commented 11 years ago

I have downloaded your branch and made the changes from your commit link and am now getting a proper error rather than an empty string.

2012-11-06 04:20:39,104 - Arbitrator.Transporter    - ERROR    - The transporter 'cumulus' has failed while transporting the file '/var/www/modules/color/color.css' (action: 1). Error: 'a float is required'.
2012-11-06 04:20:39,267 - Arbitrator.Transporter    - ERROR    - The transporter 'cumulus' has failed while transporting the file '/var/www/modules/locale/locale.css' (action: 1). Error: 'a float is required'.
chrisivens commented 11 years ago

Interesting, I've not had that error before. Trying to think what would need a float value in the transporter. There is a chance that there's a bug in my code. I've not run any tests on it.

jamestombs commented 11 years ago

I decided to try it again because things like this randomly start working and I still got the same error from your branch.

So I tried the original and it is working. So I'm not sure if running setup.py from your fork had an affect on the original install?

chrisivens commented 11 years ago

Sorry I haven't been a huge amount of help; it's been a stupidly busy week. New job next week and I'll start to combine my changes with wimleers' version to get a decent pull request for UK support. It'd be good to get some idea of a roadmap too or even a list of nice-to-have items.

jamestombs commented 11 years ago

You've been extremely helpful.

If you need any testing with the UK CloudFiles, let me know.

jamestombs commented 11 years ago

There is one thing I can't get my head around and that is where the fileconveyor.pid file is stored.

In settings.py it has it down in ~/.fileconveyor.pid but I can't find that. I have fileconveyor running under virtualenv. Do I need to run it outside of virtualenv? I tried updating the location to a users home directory and ran arbitrator.py but a new file wasn't created.

What steps need to be taken to get access to this file or to move it elsewhere?

chrisivens commented 11 years ago

I moved my pod file /var/run because I couldn't get it right either. Depends on the user that fires it up anyway as the where the $HOME directory is. I use upstart for fileconveyor anyway.

jamestombs commented 11 years ago

I set it to /var/run/fileconveyor and the pid file still isn't being created. I set the log file to be created in the same folder to make sure the write permissions are working and the log file is created and written successfully.

Are there issues with moving the pid file once arbitrator.py has been run or shouldn't it matter?

chrisivens commented 11 years ago

If you're not running as root, /var/run is probably set to 0755 permissions which prevents you writing the pid file. Perhaps add a sub-directory with permissions that your user can write to.

jamestombs commented 11 years ago

I was running as root. The fileconveyor subdirectory was created by root so I assume the permissions are all OK especially as the log file was created.

chrisivens commented 11 years ago

If you have the file named .fileconveyor.pid it'll not show unless you do ls -a though because it hides files beginning with .

jamestombs commented 11 years ago

I renamed it to just fileconveyor.pid without the prefixed . Does it require the . at the beginning? I'm pretty sure I tried it before with it with no luck.

chrisivens commented 11 years ago

It doesn't require it at all. It's just a file with a number in it at the end of the day.

jamestombs commented 11 years ago

Hmm, wonder why it isn't being created. I tried doing a search on the server for the file using find with no luck.

Could this be a problem with running fileconveyor under virtualenv?

chrisivens commented 11 years ago

I really can't be sure as I don't use virtualenv. I use virtual machines instead.

jamestombs commented 11 years ago

I think I know what the problem was. It was me.

I think the pid file is deleted once arbitrator.py is stopped? I was running it in the console, then exiting it then looking for the pid file.

Have now got it running in the background and can access to the pid file.

jamestombs commented 11 years ago

Still not having much luck. All the files appear to have copied over to the CDN but the CDN drupal module reporting that '14572 files are waiting to be synced.'

I have manually used the public link from Rackspace Cloud manager to get the URL and the images do exist on the CDN are come back correctly.

So I have set up fileconveyor on a test rig and am still having trouble.

When running your uk-support branch I get the following:

Traceback (most recent call last):
  File "arbitrator.py", line 1185, in <module>
    run_file_conveyor()
  File "arbitrator.py", line 1168, in run_file_conveyor
    arbitrator = Arbitrator(os.path.join(FILE_CONVEYOR_PATH, "config.xml"), restart)
  File "arbitrator.py", line 151, in __init__
    transporter = self.__create_transporter(server)
  File "arbitrator.py", line 928, in __create_transporter
    transporter = transporter_class(settings, self.transporter_callback, self.transporter_error_callback, "Arbitrator")
  File "/home/mint/src/fcuk/fileconveyor/transporters/transporter_cloudfiles.py", line 36, in __init__
    if e.__class__ == cloudfiles.errors.AuthenticationFailed:
NameError: global name 'cloudfiles' is not defined

Have checked the config file and the api key etc are all correct.

As I had no luck with this I tried your no-delete branch. Although I get no errors like the above the files fail to transport with the same Error: '' from above.

Not sure where to go from here. I guess I must be doing something wrong as none of the branches appear to be working or there is something odd with Rackspace.

wimleers commented 11 years ago

I think the error message you're running into is in fact just a slightly different manifestation of the one reported at #97 (AuthenticationFailed) — it's probably different because you've messed around a bit with File Conveyor's code (if I read the thread well).

If @chrisivens has some time to get a pull request going to add CloudFiles UK support, I'd be happy to merge it and solve all those troubles forever :)