Open drjimmyjiang opened 4 months ago
@drjimmyjiang
It seems like you might be running into this function frequently. If so, it's possible that previous executions have filled up the space, causing the current execution to fail. The ~/tmp` directory can indeed be shared between multiple invocations.
One solution is to clear the /tmp
directory when your Lambda function starts, using shutil.rmtree("/tmp")
for example. However, please be cautious with this approach as it will remove all files in the /tmp directory, which might include files from other processes.
Alternatively, you can ensure that the temporary directory is cleaned up after your Lambda function finishes executing. Here's an example using the tempfile module with a context manager:
import tempfile
from selenium import webdriver
# Use the tempfile module to create a temporary directory
with tempfile.TemporaryDirectory() as temp_dir:
options = webdriver.ChromeOptions()
options.add_argument(f"--user-data-dir={temp_dir}")
I'll keep this issue open for a little while in case someone else encounters the same problem.
@drjimmyjiang
It seems like you might be running into this function frequently. If so, it's possible that previous executions have filled up the space, causing the current execution to fail. The ~/tmp` directory can indeed be shared between multiple invocations.
One solution is to clear the
/tmp
directory when your Lambda function starts, usingshutil.rmtree("/tmp")
for example. However, please be cautious with this approach as it will remove all files in the /tmp directory, which might include files from other processes.Alternatively, you can ensure that the temporary directory is cleaned up after your Lambda function finishes executing. Here's an example using the tempfile module with a context manager:
import tempfile from selenium import webdriver # Use the tempfile module to create a temporary directory with tempfile.TemporaryDirectory() as temp_dir: options = webdriver.ChromeOptions() options.add_argument(f"--user-data-dir={temp_dir}")
I'll keep this issue open for a little while in case someone else encounters the same problem.
Thank you so much for your response umihico. I do run this function very frequently with multiple concurrent invocations. Ideally, I'd like each invocation to be independent of one another and not share any files in the ~/temp directory so that it can handle a high volume of concurrent invocations.
Is it possible to omit the use of tempfile altogether? Is so, what are the drawbacks?
Does it really matter whether the /tmp directory is cleared before or after executing the Lambda function if there are shared files from other concurrent invocations? (Wouldn't it cause problems either way?)
If using the tempfile module to create a temporary directory is the best solution for my use case, may I ask how I would modify the existing main.py file and where I would insert the code snippet? Thank you for your help.
+ import shutil
def handler(event=None, context=None):
+ shutil.rmtree("/tmp")
options = webdriver.ChromeOptions()
service = webdriver.ChromeService("/opt/chromedriver")
Maybe like this? I hope this works.
+ import shutil def handler(event=None, context=None): + shutil.rmtree("/tmp") options = webdriver.ChromeOptions() service = webdriver.ChromeService("/opt/chromedriver")
Maybe like this? I hope this works.
Thank you so much, I'll give it a try.
I tried to remove using rmtree and had the following error:
File "/var/task/src/functions/scraper_engine/scraper.py", line 64, in config_webdriver_with_retry
return webdriver.Chrome(
^^^^^^^^^^^^^^^^^
File "/var/lang/lib/python3.11/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__
super().__init__(
File "/var/lang/lib/python3.11/site-packages/selenium/webdriver/chromium/webdriver.py", line 61, in __init__
super().__init__(command_executor=executor, options=options)
File "/var/lang/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 208, in __init__
self.start_session(capabilities)
File "/var/lang/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 292, in start_session
response = self.execute(Command.NEW_SESSION, caps)["value"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lang/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 347, in execute
self.error_handler.check_response(response)
File "/var/lang/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py", line 229, in check_response
In order to remove the content but not the folder I created the following function inside utils.py:
import os
import shutil
def clear_directory_contents(dir_path):
for item in os.listdir(dir_path):
item_path = os.path.join(dir_path, item)
if os.path.isdir(item_path):
shutil.rmtree(item_path)
else:
os.remove(item_path)
Also... @drjimmyjiang check how big is the tmp files/folders that your code is generating, I needed to increase my ephemeral storage through serverless.yml:
ephemeralStorageSize: 2048
I'm getting the following error message. It doesn't happen all the time. It's never happened before until today. Any ideas would be much appreciated. Do I need to change anything in main.py?
[ERROR] OSError: [Errno 28] No space left on device: '/tmp/tmp_ovipfqh' Traceback (most recent call last): File "/var/task/bot.py", line 31, in handler options.add_argument(f"--user-data-dir={mkdtemp()}") File "/var/lang/lib/python3.12/tempfile.py", line 368, in mkdtemp _os.mkdir(file, 0o700)