Closed alirf81 closed 2 years ago
Thanks for raising this.
Is this a general problem or a problem with a specific site? I notice if I visit https://www.wikipedia.org/
using Selenium Wire and Chrome, the assets seem to be retrieved from the cache without any problem.
Do you see the same if you visit wikipedia with Selenium Wire?
I'm using ChromeDriver 99.0.4844.51 and Selenium Wire 4.6.3
Thank you for your reply, https://m.facebook.com/reg Would you please open this URL and if it uses disk-cache? If it uses disk-cache, could you send me your Python script which initializes Selenium driver? If they use memory-cache only, they will be removed when the script finishes and when I start it again, I need to load them again
https://m.facebook.com/reg
seems to use a combination of both:
But then if I don't use Selenium Wire, I see very similar caching behaviour with Chrome:
So it seems that Chrome is using caching (disk and memory) whether I use Selenium Wire or not.
Do you see the same issue if you use pure Selenium (no Selenium Wire)?
The pure Selenium use disk cache but the Selenium Wire doesn't. As I debugged, the Selenium Wire adds the chrome option for proxy (I guess this is for capturing requests) and it makes it use memory cache only. As a result, the files are not stored in user-data-dir and when I run again, it loads files again from web url
Is it possible to use interceptor of Selenium Wire instead of cache?
If I create a mock-up response by using interceptor, the request wouldn't be passed to website server, right?
Yes that's correct. With regard to your other question Is it possible to use interceptor of Selenium Wire instead of cache? you could try using an interceptor to set the Cache-Control
header for certain requests. That indicates to the browser that it can reuse the response from the cache for those requests. e.g.
def interceptor(request):
if request.path = '/some/path/of/request/i/want/to/cache':
del request.headers["Cache-Control"]
request.headers["Cache-Control"] = "max-age=604800"
That might require a bit of experimentation.
https://pypi.org/project/CacheControl/ By using this library and interceptor, I managed to implement the cache. Thank you for your help.
Hi @alirf81
are you able to share your solution?
I have a similar problem, selenium-wire isn't using the disk cache.
For anyone else searching how to do this, here's how I solved it kinda. It's not a great solution, it's pretty messy, but it works and I have bigger fish to fry, so I'm moving on from this.
The reason I was looking for this was that it kept downloading the same static content from a CDN, instead of recieving it from a cache. So I check if the cdn url is in the request, and if it is, I try to retrieve it from the cache.
response_cache = {}
def request_interceptor(request):
if "cdn" in request.url and request.url in response_cache:
request.response = response_cache[request.url]
def response_interceptor(request, response):
global response_cache
if "cdn" in request.url and response.status_code == 200:
if request.url not in response_cache:
response_cache[request.url] = response
else:
response.body = response_cache[request.url].body
You may also need to put the CDN url into the driver's scope
driver = webdriver.Chrome(executable_path=executable_path, chrome_options=chrome_options, seleniumwire_options=options, desired_capabilities=capabilities)
driver.scopes = [
".*cdn.com.*"
]
driver.request_interceptor = request_interceptor
driver.response_interceptor = response_interceptor
Hi, thank you for your contribution I found an issue and when I use selenium-wire with Chrome, the files are not stored in disk-cache. I need to store the static files of a web page in disk-cache like in the above image so I can re-use them when I run the script later. Could you give me some comments?