richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
214 stars 30 forks source link

Allow setting the directory for temporary files #243

Closed max-moser closed 4 months ago

max-moser commented 4 months ago

We're using siegfried to get an overview of the file format landscape regarding the files uploaded into our research data repository. In our setup, we're using a VM that has limited RAM and "primary" disk space, but sufficient amounts of space on a network-based storage where the uploaded files are stored.

In order to speed things up, we're running a few (~4 seems to work well enough) instances of siegfried in parallel. However, it seems like these runs generate temporary files that fill up the entire space of our primary disk for brief periods of time (we have a few archive files that single-handedly exceed the space of our root partition).

Since the large storage volume is network-based, I don't think it'd be a great idea to mount parts of it under /tmp – instead, I think it would be nice to have a way of telling siegfried to use a different directory for its temporary files.

richardlehane commented 4 months ago

Thanks for these issues Max! I think this one may already be possible and just need a documentation update.

Under the hood sf just uses the golang std library to create the temp files. These calls use the os.TempDir function which allows end users to set the directory through environment variables, see the docs here: https://pkg.go.dev/os#TempDir

max-moser commented 4 months ago

Setting $TMPDIR works like a charm, thanks!