pascalin / pycoon

Automatically exported from code.google.com/p/pycoon
GNU General Public License v2.0
0 stars 0 forks source link

Caching Mechanism #7

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I spent last weekend (2006-09-02 -> 03) working on a caching mechanism 
for Pycoon. It wasn't very successful. I've committed the results of this 
effort to the "caching_original" branch in the repository.

There were several problems with my attempted implementation. First was 
that I tried to use the Python "pickling" system to serialize the 
ElementTree at certain points in the pipeline and store it on disk. 
However, Python will not pickle instances of ElementTree's Element class. 
(Its possible that the ElementTree implementation built in to Python 2.5 
may be pickleable).

Next was that, in the case where it tries to assess whether or not it can 
use the cached data based on the modified date of the sources used in a 
pipeline, the process of examining (generally with a stat command) each 
external resource used in the pipeline to compare its modified date with 
the request time seemed to be so expensive that there was likely to be no 
performance advantage to doing it over just executing the pipeline 
anyway.

I couldn't think of a viable method of generalising the "is_modified" 
predicate function, so it would have been necessary for each component 
class to define its own. (Actually, this isn't necessarily a bad thing.)

Also, I couldn't think a way of generalising the method to access cached 
data: the "requires_reload", "retrieve" and "store" methods were 
generalized, but each component, in its "_result" method would have to 
repeat the the algorithm to use them:

if requires_reload(request_context):
  result = execute(request_context)
  store(result)
  return result
else:
  result = retrieve()
  return result

and it was more complex in real code than it is in this pseudo-code.

Some good points about it, though, were that it could cache any 
stream_component (potentially), including a pipeline itself. Also, the 
alternative reload policies may work better than the "is_modified" reload 
policy. They were: wait for a certain amount of time between requests; 
reload after repeated requests.

In the /svn/branches/caching_original directory, see "resources.py" for 
the main caching mechanism code and "pipeline.py" 
and "sources/swishe_source.py" for uses.

Original issue reported on code.google.com by pyc...@gmail.com on 4 Sep 2006 at 1:19

GoogleCodeExporter commented 9 years ago
In r120 I use Source.getLastModified() method for checking whether a 
sitemap.xmap or
a <map:read> source were modified. The client-side caching mechanism based on 
HTTP
304 and If-Modified-Since header is implemented now in this way.

But there are neither client-side nor server-side caching of XML pipeline 
results.

Original comment by anrien...@gmail.com on 26 Feb 2007 at 9:08