tomchang25 / whisper-auto-transcribe

Auto transcribe tool based on whisper
MIT License
216 stars 14 forks source link

0.3.1 temporal files in tmp\ folder are not deleted after finishing each job #29

Closed Milincho closed 1 year ago

Milincho commented 1 year ago

Temporal files pile up in the tmp\ folder and are not deleted after each job.

So if you run a batch of 100 conversions you end up with 100+ GB in temporal files filling up your disk...

tomchang25 commented 1 year ago

Will add an option to allow choose tmp dir or windows cache temp

Milincho commented 1 year ago

Will add an option to allow choose tmp dir or windows cache temp

It's not where it saves the files, but that they remain occupying disk space after the job is finished, so if you are making a long batch dozens of GB of temporal files start to pile up. It needs an option to delete (not recycle) those temporal files after each single job is finished.

Milincho commented 1 year ago

How does that option work for CLI?

amerodeh commented 1 year ago

Looking at the linked PR, the relevant bit of code is

    parser.add_argument(
        "--remain-tempfile",
        action="store_true",
        help="Keep temporary file after processing. Default is False.",
        required=False,
        default=False,

so for CLI the flag is --remain-tempfile but by default it's false so it should by default delete the files.

Milincho commented 1 year ago

Got this error:

image

And after that it left a 3,7GB main.mkv file, which is a copy of the original file... in the same folder of the original file. What's the purpose of that? Why does it need to make a copy of the original file on the same folder? and then it doesn't even delete it at the end if there is an error?

There are also 2 .wav files at c:\Temp\htdemucs\: vocals.wav and no_vocals.wav. They also weren't deleted.

amerodeh commented 1 year ago

Yea that seems like an oversight. I presume the cleanup/deletion code runs last so if there's an error during the process, the cleanup code won't have a chance to run. The dev would need to make the cleanup more persistent/resilient and run regardless of success/failure