Closed falibabaei closed 2 months ago
@falibabaei Yes, it's possible to remove the input and output files from the temporary directory automatically. You can use the tempfile
module in Python to create temporary files and directories, and set them to delete automatically when the program ends.
I use tempfile to save the input files and I know I can program it to remove the files after the request is finished, but what I mean is that Gradio removes the files themselves. I have swagger UI and there I do not need any code to remove the input and output files. The output file is just a stream and after the user is done with the prediction everything is deleted without me having to enter any code. I had a memory problem, and when I investigated it, I found that the input files remain in the temporary directory and cause this problem. Otherwise, I did not know anything about it, and I could not find anything in the documentation about this problem. I think it is a serious problem and should be mentioned somewhere.
Hi @falibabaei the basic reason for this is that Gradio doesn't "know" when a user has stopped using the application and it is safe to clear the temporary files. I'll open this issue up for brainstorming if anyone has any suggestions
There is an ugly solution in the Gradio interface. I haven't started tracking the code of blocks yet, so I'm not sure if there is a similar solution.
import gradio as gr
import os
def ret(name):
return name
demo = gr.Interface(
ret,
"video",
"video",
)
if __name__ == "__main__":
try:
demo.launch()
finally:
print(len(demo.input_components[0].temp_files))
print(len(demo.input_components))
print(len(demo.output_components[0].temp_files))
print(len(demo.output_components))
for x in demo.input_components[0].temp_files:
try:
os.remove(x)
except OSError as e:
print(f"Error deleting file: {e}")
finally:
print(x)
for x in demo.output_components[0].temp_files:
try:
os.remove(x)
except OSError as e:
print(f"Error deleting file: {e}")
finally:
print(x)
But there are two issues with it:
gr.Image(type='filepath')
because there are two temporary files.
Thank you very much for the quick reply. @tomchang25 I have the second problem with my input files, which are not images. They are just text files, but I still have two temporary files. I have the path of one file and can delete it, but I do not know about the second one. The same is true for the output files. Also, I use the block
@falibabaei, Yes, it seems that there are some leaking resources in temp_files
. Some temporary files are redundant or not added to temp_files
, so they remain in the temporary directory indefinitely.
So, if we want to solve this problem, we may need to go through all the IOComponent to ensure that no temporary files are missed
Anyway, I have packaged all this code so that users can use the deconstruct
method to clear the temporary files after finishing the process.
// IOComponent
...
def deconstruct(self):
while self.temp_files:
temp_file = self.temp_files.pop()
os.remove(temp_file)
...
// Interface
...
def deconstruct(self):
for x in self.input_components:
if isinstance(x, IOComponent):
x.deconstruct()
for x in self.output_components:
if isinstance(x, IOComponent):
x.deconstruct()
...
import gradio as gr
import os
def ret(name):
return name
demo = gr.Interface(
ret,
gr.Image(type="filepath"),
"image",
)
if __name__ == "__main__":
try:
demo.launch()
finally:
demo.deconstruct()
However, I'm not sure where the component is stored in Blocks. I would like to hear your opinion on this, @abidlabs
For output files, it's hard to know when a usage has expired, since we return a link to the output files and not the file itself. I think the best approach would be to
hello @abidlabs just confirming if this is still open? do you know if users can have still access to temp files, and if yes, is there any documentation to prevent this and protect the privacy of the user?
Hi @cowanAI yes this is still open. You can take a look at the current security policy here: https://gradio.app/sharing-your-app/#security-and-file-access
@abidlabs so basically the highest grade of security I can get is by creating the most random custom temporary directory is that correct? even if Im deploying from a docker container hackers could get access to the files of my users? what if I use the 'chmod' command to set the appropriate permissions. For example, setting the folder to be accessible only by the owner using chmod 700 /path/to/temp/folder ?
also, dont you think this exposes all gradio users to an incredible grade of legal liability?
ohh I also forgot something very important, I actually encrypted my EBS volume storage from the EC2 instance, do you think that helps random people accesing and eavesdropping the files of my users?
it seems that this doesnt apply to EC2 instances docker containers, docker containers doesnt expose their temp files to users
Why not use the tempfile
within a context:
# audio_file: bytes
with tempfile.NamedTemporaryFile(suffix=".gradio") as temp:
temp.write(audio_file)
temp.flush()
# do the business here
# The file is automatically removed now
It is safe to suppose that once the callback returns, the file isn't needed anymore and can be removed.
I'm thinking of a decorator maybe that can be added to callbacks that will use large files:
def temp_file_decorator(func):
@wraps(func)
def wrapper(large_file: bytes, *args, **kwargs):
with tempfile.NamedTemporaryFile(suffix=".tmp", delete=False) as temp_file:
temp.write(large_file)
temp.flush()
result = func(temp_file.name, *args, **kwargs)
return result
return wrapper
Then we can use it like the example in #4620 :
@temp_file_decorator
def get_duration_ms(audio_file):
duration = mediainfo(audio_file)["duration"] # a string in seconds
duration_ms = int(float(duration) * 1000)
print(f"{audio_file}: {duration_ms} ms")
return duration_ms
@msis did this work for you?
Hello @abidlabs @freddyaboulton do you have a solution by any chance in case we want to generate bigger files and avoid the app to stop working due to ram leakage in production?
Sorry @cowanAI we don't have a workaround at the moment, it's something we're going to look into.
Hi @falibabaei the basic reason for this is that Gradio doesn't "know" when a user has stopped using the application and it is safe to clear the temporary files.
Sounds like a misconception in your application...
I'm currently messing around with the UploadButton component. I've discovered that uploading a file actually creates two /tmp
directories, each with a copy of the file. The .upload()
event handler method only passes one of those files into the handler function, so the other directory and file are not easily deleted. I just figured that this should get a look when the tmpfile code gets an overhaul.
Thanks @jcheroske - I believe this is fixed in the v4
branch which will turn into gradio 4.0
I think you need to update the documentation to make it explicit that the temporary files are available to "all (authenticated) users", not just "users".
Here's my uneducated pitch: If auth
is set in .launch()
, generate a random key on startup, and encrypt and decrypt temporary files with a hash of that key and the uploading user's authentication token. Then even if you have a working password, and the filename, you would need the session key (which unlike the filename is never displayed on the screen) and the random key (which is server-side only, and lost when the server terminates), in order to overcome the encryption.
Gradio temp files infinitely grow. This is problematic. Because every time I generate an image and pass it to an image component it generates another temp file, even though I'm passing as numpy array. Gradio should manage the total size of the temp files it generates. As it stands it seems like Gradio is eating up lots and lots of space on lots and lots of peoples hard drives, for no good reason.
Hi @TashaSkyUp ! Working on cleaning up the temp directory here #7447
This will be possible in the next release of gradio with the delete_cache
parameter. It is a tuple of integers (frequency, age). Set it to (3600, 3660) to delete all files older than an hour every hour.
I think you should add a button wich can control whether delete cache
Hello, I am using gradio to create a UI for my API. The problem is that after prediction, the input and output files remain in the temporary directory. This causes two problems