wimleers / fileconveyor

File Conveyor is a daemon written in Python to detect, process and sync files. In particular, it's designed to sync files to CDNs. Amazon S3 and Rackspace Cloud Files, as well as any Origin Pull or (S)FTP Push CDN, are supported. Originally written for my bachelor thesis at Hasselt University in Belgium.
https://wimleers.com/fileconveyor
The Unlicense
341 stars 95 forks source link

Mapping fileconveyor to several static assets #39

Closed peterbowey closed 13 years ago

peterbowey commented 13 years ago

Question: How can you achieve this concept:

I tried several [logical] combination's as to what seem Logical (reading the code, doc's and thesis) - but NO good result on parallel asset divisions.

The [various] notes given do not speak [clearly] of the inner code design or code methods. I have [therefore] resorted to hours of 'guess + trial work' to find the methods that yet work.

I [additionally] raise a note that those CSS assets that need to be re-queued for processing will [finally] miss out on having compression via the [yuicompressor] compression processor. My testing indicates that all CSS assets that have survived the re-queue process of fileconveyor are eventually finalized as 'uncompressed' assets.

Please advise - as time permits.

wimleers commented 13 years ago

That can be achieved by looking at https://github.com/wimleers/fileconveyor/blob/master/code/config.sample.xml for inspiration and then: 1) Creating 1 source: your Drupal root directory 2) Creating 4 servers (that can then later on be used as the destination): static0, static1, static2, static3 3) Creating 4 rules, one for CSS, one for JS, one for images, one for Flash, and each with their corresponding destination

peterbowey commented 13 years ago

Thanks, that combination was attempted: It fails for any CSS that was flagged for the re-queue event:

Error: "Exception class: <type 'exceptions.TypeError'>. Message: 'NoneType' object is not subscriptable"

Actual config.xml used (below):

//------------------------------------------------------------------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?>

/var/www/virtual/computerdocs.com.au http://static0.computerdocs.com.au ``` /var/www/virtual/computerdocs.com.au http://static1.computerdocs.com.au /var/www/virtual/computerdocs.com.au http://static1.computerdocs.com.au ``` misc:profiles:modules:themes:sites/all:sites/default ico:gif:png:jpg:jpeg:svg:swf ``` misc:profiles:modules:themes:sites/all:sites/default js misc:profiles:modules:themes:sites/all:sites/default css ```

//-------------------------------------------------------------------------------------------------------------------------

Notes 2:

And [still] I am curious why for a default [single] asset use of [fileconveyor] fails to 'yuicompress' those [few] CSS assets that have been [held back] [re-queued].

Reading the souce code, it is not apparent how this [re-event] queue process deals this CSS 're-run'.

peterbowey commented 13 years ago

Part of actual [error] log for above event:

// -------------------------------------------------------------------------------------------------------------------------- 2011-01-29 16:42:28,402 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/virtual/peterbowey.com.au/sites/all/modules/lightbox2/css/lightbox_lite-rtl.css'. Retr ying later. 2011-01-29 16:42:28,842 - Arbitrator - INFO - Pipeline queue -> filter queue: '/var/www/virtual/peterbowey.com.au/modules/system/system.css'. 2011-01-29 16:42:28,843 - Arbitrator - INFO - Filtering: '/var/www/virtual/peterbowey.com.au/modules/system/system.css' matches the 'CSS' rule for the 'peterbowey' source! 2011-01-29 16:42:28,844 - Arbitrator - INFO - Filter queue -> process queue: '/var/www/virtual/peterbowey.com.au/modules/system/system.css' for server 'cdn3' (rule: 'CSS'). 2011-01-29 16:42:29,208 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/virtual/peterbowey.com.au/sites/all/modules/lightbox 2/css/lightbox_alt.css'. Exception class: <type 'exceptions.TypeError'>. Message: 'NoneType' object is not subscriptable. 2011-01-29 16:42:29,473 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/virtual/peterbowey.com.au/sites/all/modules/lightbox2/css/lightbox_alt.css'. Retrying later. 2011-01-29 16:42:30,041 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/virtual/peterbowey.com.au/sites/all/modules/quicktab s/css/quicktabs-admin.css'. Exception class: <type 'exceptions.TypeError'>. Message: 'NoneType' object is not subscriptable. 2011-01-29 16:42:30,075 - Arbitrator - INFO - Pipeline queue -> filter queue: '/var/www/virtual/peterbowey.com.au/modules/color/color.css'. 2011-01-29 16:42:30,076 - Arbitrator - INFO - Filtering: '/var/www/virtual/peterbowey.com.au/modules/color/color.css' matches the 'CSS' rule for the 'peterbowey' source! 2011-01-29 16:42:30,077 - Arbitrator - INFO - Filter queue -> process queue: '/var/www/virtual/peterbowey.com.au/modules/color/color.css' for server 'cdn3' (rule: 'CSS'). 2011-01-29 16:42:30,140 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/virtual/peterbowey.com.au/sites/all/modules/quicktabs/css/quicktabs-admin.css'. Retryi ng later. 2011-01-29 16:42:30,620 - Arbitrator - INFO - Pipeline queue -> filter queue: '/var/www/virtual/peterbowey.com.au/modules/system/system-rtl.css'. 2011-01-29 16:42:30,687 - Arbitrator - INFO - Filtering: '/var/www/virtual/peterbowey.com.au/modules/system/system-rtl.css' matches the 'CSS' rule for the 'peterbowey' source! 2011-01-29 16:42:30,711 - Arbitrator - INFO - Filter queue -> process queue: '/var/www/virtual/peterbowey.com.au/modules/system/system-rtl.css' for server 'cdn3' (rule: 'CSS'). 2011-01-29 16:42:30,714 - Arbitrator.ProcessorChain - ERROR - The processsor 'link_updater.CSSURLUpdater' has failed while processing the file '/var/www/virtual/peterbowey.com.au/sites/all/modules/quicktab s/tabstyles/excel/excel.css'. Exception class: <type 'exceptions.TypeError'>. Message: 'NoneType' object is not subscriptable. 2011-01-29 16:42:30,965 - Arbitrator - WARNING - Retry queue -> 'failed_files' persistent list: '/var/www/virtual/peterbowey.com.au/sites/all/modules/quicktabs/tabstyles/excel/excel.css'. Retr ying later. //---------------------------------------------------------------------------------------------------------------------------

wimleers commented 13 years ago

Remarks: 1) Servers "cdn2" and "cdn3" are identical. 2) You should change the processorChains to suit the rule: it doesn't make sense to have an image optimizer for CSS files, for example. 3) CSS files are requeued to be retried later if they contain images that are not yet on the CDN. However, that "Message: 'NoneType' object is not subscriptable." error message is very weird and is probably caused by a bug. 4) Try removing the "link_updater.CSSURLUpdater" processor everywhere. Most likely, it'll then work fine.

peterbowey commented 13 years ago

Thanks Wim,

1) I 'corrected' the ["cdn2" and "cdn3" are identical issue] = my fault on late hours. 2) I modified the processorChains to match the given rules (this had been already tried - before) 3) I clearly do understand the CSS requeue for images (I have designed and worked with many CDN designs) 4) The daemon fileconveyor DOES work if I kill the "link_updater.CSSURLUpdater" processor, but that moots [kills] the entire point of using this CDN process. 5) I even tried 'downgrading the http://cssutils.googlecode.com/files/cssutils-0.9.8a1-py2.7.egg I was previously using to the older http://cssutils.googlecode.com/files/cssutils-0.9.7-py2.7.egg.

For the moment I have gone back to a previous project re-designing http://code.google.com/p/web-optimizator/source/browse/trunk/ to Drupal, I had great success doing this for Wordpress 3+.

It does amaze me how [very] few dedicated CDN asset managers are actually being written for Drupal!

Current config.xml: //------------------------------------------------------------------------------------------------------------------------ <?xml version="1.0" encoding="UTF-8"?>

/var/www/virtual/computerdocs.com.au http://static0.computerdocs.com.au ``` /var/www/virtual/computerdocs.com.au http://static1.computerdocs.com.au /var/www/virtual/computerdocs.com.au http://static2.computerdocs.com.au ``` misc:profiles:modules:themes:sites/all:sites/default ico:gif:png:jpg:jpeg:svg:swf ``` misc:profiles:modules:themes:sites/all:sites/default js misc:profiles:modules:themes:sites/all:sites/default css ```

//------------------------------------------------------------------------------------------------------------------------

peterbowey commented 13 years ago

Please note that the above daemon fileconveyor DOES work if I restore the config.xml settings to the SINGLE asset CDN given in your example - [BUT] again I must point out that the several CSS items that are re-queued do NOT get passed through the yuicompressor; they [the CSS assets] simply retain the same CSS full text content - spaces and all. The CSS that does not need the re-queue process is nicely processed via the expected yuicompressor.

I now [suspect] we both do not have much 'joy' over using this memory leaky, elephant know as "cssutils".

I do suspect this is the current weakness [broken] point processor for the success of fileconveyor.

Like you, I have tried [hard] to find the better Python CSS [re-write] means - but I have not.

I would be more inclined to spend this time on other code projects.

wimleers commented 13 years ago

Hi Peter,

I know that File Conveyor still needs work. And the CSS URL updater is the most problematic one, performance-wise, but now apparently also feature-wise.

That being said, it needs more contributors than just me. If multiple people are contributing to it, it can be sustainable. If I'm the only one working on it, it can't be. That's why not much has been happening with it. Nobody in the Drupal world seems to be paying attention to it. Which is very unfortunate. I think I'm going to start to try and rally people around this :)

But I definitely understand if you're inclined to pick another solution. Thanks for that link by the way — while I've been following WPO for about 3 years now, this is the first time I've seen that WEBO company!

peterbowey commented 13 years ago

Hi Wim,

As I like the event method's used by File Conveyor, I will keep this as a second project!

I came across this [related] site, which may add some energy to idea's / solutions:

http://pypi.python.org/pypi/django-mediagenerator/1.5.1#downloads

In the archive, there is a interesting file => cssurl.py (dumped below): //-------------------------------------------------------------------------------------------------------------------------- from django.conf import settings from mediagenerator.generators.bundles.base import Filter, FileFilter from mediagenerator.utils import media_url import logging import posixpath import re

urlre = re.compile(r'url\s(["\']?([\w.][^:]_?)["\']?)', re.UNICODE)

Whether to rewrite CSS URLs, at all

REWRITE_CSS_URLS = getattr(settings, 'REWRITE_CSS_URLS', True)

Whether to rewrite CSS URLs relative to the respective source file

or whether to use "absolute" URL rewriting (i.e., relative URLs are

considered absolute with regards to STATICFILES_URL)

REWRITE_CSS_URLS_RELATIVE_TO_SOURCE = getattr(settings, 'REWRITE_CSS_URLS_RELATIVE_TO_SOURCE', True)

class URLRewriter(object): def init(self, base_path='./'): if not base_path: base_path = './' self.base_path = base_path

def rewrite_urls(self, content):
    if not REWRITE_CSS_URLS:
        return content
    return url_re.sub(self.fixurls, content)

def fixurls(self, match):
    url = match.group(1)
    hashid = ''
    if '#' in url:
        url, hashid = url.split('#', 1)
        hashid = '#' + hashid
    if ':' not in url and not url.startswith('/'):
        rebased_url = posixpath.join(self.base_path, url)
        rebased_url = posixpath.normpath(rebased_url)
        try:
            url = media_url(rebased_url, refresh=False)
        except:
            logging.error('URL not found: %s' % url)
    return 'url(%s%s)' % (url, hashid)

class CSSURL(Filter): """Rewrites URLs relative to media folder ("absolute" rewriting).""" def init(self, kwargs): super(CSSURL, self).init(kwargs) assert self.filetype == 'css', ( 'CSSURL only supports CSS output. ' 'The parent filter expects "%s".' % self.filetype)

def get_output(self, variation):
    rewriter = URLRewriter()
    for input in self.get_input(variation):
        yield rewriter.rewrite_urls(input)

def get_dev_output(self, name, variation):
    rewriter = URLRewriter()
    content = super(CSSURL, self).get_dev_output(name, variation)
    return rewriter.rewrite_urls(content)

class CSSURLFileFilter(FileFilter): """Rewrites URLs relative to input file's location.""" def get_dev_output(self, name, variation): content = super(CSSURLFileFilter, self).get_dev_output(name, variation) if not REWRITE_CSS_URLS_RELATIVE_TO_SOURCE: return content rewriter = URLRewriter(posixpath.dirname(name)) return rewriter.rewrite_urls(content)

//--------------------------------------------------------------------------------------------------------------------------