grhbit / mirrorrr

Automatically exported from code.google.com/p/mirrorrr
0 stars 0 forks source link

memcache.set('latest_urls') instead of memcache.add('latest_urls') #2

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Great program - it works fine as long as the original site answers fast
enough :-)

One issue I had when deploying this on App Engine was that the latest_urls
were not getting added to memcache - maybe because there were no
entrypoints to start with, and it stored an empty list ?

Anyway, replacing memcache.add('latest_urls') by
memcache.set('latest_urls') fixed the issue...

Some suggestions for the future:
1. ability to limit the mirroring to certain sites (personal proxy)

2. ability to place this in a subdirectory /mirror/ (or whatever)
For this to work, transform_content.py needs to be adapted. Since I have
sites using base href="..." and relative URLs relating to that base, the
easiest way I found for this to work was to :
a. add %{host_url}s/mirror/ for ABSOLUTE_URL_REGEX urls - complete URL so
that the base href is taken into account and the browser can do its job
b. add /mirror/%(base)s/ for BASE_RELATIVE_URL_REGEX urls
c. don't add %(accessed_dir)s for SAME_DIR_URL_REGEX and
TRAVERSAL_URL_REGEX urls

Original issue reported on code.google.com by mikes...@gmail.com on 7 Apr 2009 at 9:00