wimleers / fileconveyor

File Conveyor is a daemon written in Python to detect, process and sync files. In particular, it's designed to sync files to CDNs. Amazon S3 and Rackspace Cloud Files, as well as any Origin Pull or (S)FTP Push CDN, are supported. Originally written for my bachelor thesis at Hasselt University in Belgium.
https://wimleers.com/fileconveyor
The Unlicense
341 stars 95 forks source link

'no more transporters are available' loop #145

Open mrsippy opened 11 years ago

mrsippy commented 11 years ago

Hi,

I recently implemented fileconveyor to sync static content for a number of sites to Rackspace Cloud Files.

I am not doing any processing on any of the files, just syncing them as they are found. As such, my config file has a source entry for each site, a server entry for each site, and 1 rule for each site.

When I run fileconveyor it works up until a point, it will usually run successfully for 10 - 15 minutes, and will then stop syncing for no apparent reason. I contacted Wim who suggested I increase logging level to "DEBUG", which I have done, and can now see that fileconveyor stops syncing because it gets into a loop where it is logging the following message +/- 5 times a second:

Transporting: no more transporters are available for server 'xxxxxxx'

Where xxxxxxx is one of the aforementioned server entries from my config file.

I have witnessed it this morning stuck in this loop for 20 minutes before I stopped it.

Also, if I start fileconveyor again, it inevitably gets into a loop again, but not necessarily on the same server.

My settings.py looks like this:

RESTART_AFTER_UNHANDLED_EXCEPTION = True
RESTART_INTERVAL = 5
LOG_FILE = '/var/log/fileconveyor.log'
PID_FILE = '/var/run/fileconveyor.pid'
PERSISTENT_DATA_DB = '/usr/local/src/fileconveyor/fileconveyor/persistent_data.db'
SYNCED_FILES_DB = '/usr/local/src/fileconveyor/fileconveyor/synced_files.db'
WORKING_DIR = '/tmp/fileconveyor'
MAX_FILES_IN_PIPELINE = 100
MAX_SIMULTANEOUS_PROCESSORCHAINS = 2
MAX_SIMULTANEOUS_TRANSPORTERS = 20
MAX_TRANSPORTER_QUEUE_SIZE = 3
QUEUE_PROCESS_BATCH_SIZE = 40
CALLBACKS_CONSOLE_OUTPUT = False
CONSOLE_LOGGER_LEVEL = logging.WARNING
FILE_LOGGER_LEVEL = logging.DEBUG
RETRY_INTERVAL = 5

My config.xml file is too lengthy to paste in full, but is essentially structured as follows:

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <!-- Sources -->
  <sources ignoredDirs="">
    <source name="website_1" scanPath="/var/www/website_1/htdocs/wp-content/uploads" />
    <source name="website_2" scanPath="/var/www/website_2/htdocs/wp-content/uploads" />
    etc
  </sources>

  <!-- Servers -->
  <servers>
    <server name="server_website_1" transporter="cloudfiles">
      <username>xxxxxxxxxxxx</username>
      <api_key>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</api_key>
      <container>server_1</container>
    </server>
    <server name="server_website_2" transporter="cloudfiles">
      <username>xxxxxxxxxxxx</username>
      <api_key>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx</api_key>
      <container>server_2</container>
    </server>
  </servers>

 <!-- Rules -->
  <rules>
    <rule for="website_1" label="Website_1">
      <destinations>
        <destination server="server_website_1" path="/wp-content/uploads" />
      </destinations>
    </rule>
    <rule for="website_2" label="Website_2">
      <destinations>
        <destination server="server_website_2" path="/wp-content/uploads" />
      </destinations>
    </rule>
  </rules>
</config>

Any ideas?

Many thanks in anticipation -

chris

mrsippy commented 11 years ago

Any ideas? Anyone? I would try to fix it myself but am not a python developer and am not sure where to start.

wimleers commented 11 years ago

It's these two settings that determine how many simultaneous transporters there can be, and how many files can be queued for each:

MAX_SIMULTANEOUS_TRANSPORTERS = 20
MAX_TRANSPORTER_QUEUE_SIZE = 3

The message Transporting: no more transporters are available for server 'xxxxxxx' doesn't mean File Conveyor is stuck, it means that all 20 transporters already are A) transporting files, B) they each already have 3 queued files.

File Conveyor will just retry a bit later :)

Probably either or both of these things are true:

  1. (Some of) your files are rather large and hence take a long time to transport.
  2. Rackspace Cloud Files is being rather slow.
mrsippy commented 11 years ago

Thanks for getting back to me Wim. I have my doubts about the possible causes you suggest because I've seen fileconveyor in this state for 24 hours+ when there has been little to sync. I will try increasing the number of transporters and the queue size and test further.

chris

mrsippy commented 11 years ago

Hi Wim,

I started fileconveyor again shortly after leaving my last comment. Incidentally, in case it's of any bearing, I'm running fileconveyor using nohup, i.e.

nohup python /usr/local/src/fileconveyor/fileconveyor/arbitrator.py > /var/log/nohup.log 2>&1&

The last file that fileconveyor synced was at 11:06am yesterday, some 30+ hours ago, and there have been many files added to my sites since then which should have been synced.

Any ideas?

mrsippy commented 11 years ago

This is still an issue for me I'm afraid.

wimleers commented 10 years ago

Can you enable debug logging and then analyze your log to check if something bizarre/interesting is happening? Alternatively, upload the log here.

leesolway commented 10 years ago

Same issue for me unfortunately. I can't see anything in the log that is unusual?

wimleers commented 10 years ago

Then can you please post your log somewhere so I can take a look at it? (You can post it here, though I'm not sure how big files GitHub will accept.)

trolleycrash commented 10 years ago

We were also experiencing this same issue. After much hair-pulling, I zeroed in on what the problem was. We had an empty processor chain, which seemed to result in the processor callback not getting fired all the time. The effect was that the processor queue would fill right up, and we would exceed MAX_FILES_IN_PIPELINE. Then everything would stall.

What we did to solve it was just add an innocuous processor to the processor chain in config.xml:

<processorChain>
        <processor name="unique_filename.Mtime" />
</processorChain>

For reference, I believe it's possible this is what was causing Issue 129 as well.