sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
924 stars 217 forks source link

Makes Wav2Vec2 transcription robust to new versions of python libraries #1945

Closed lfcnassif closed 10 months ago

lfcnassif commented 10 months ago

While working on #1944, python transcription processes were not starting fine. Executing them manually, I saw new Console messages printed by some python library that broke the communication protocol between java and python processes. Although I anticipated this and tried to redirect stdout to stderr at the beginning of the python script, my approach didn't work... New console messages are:

Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize
Ignored unknown kwarg option normalize

Commit 4a0a4f866fa606277917d10b8bc973d1b74a1458 fixes this.

lfcnassif commented 10 months ago

In the future new messages of new python library versions could show up and we should redirect stdout prints from python libs to stderr correctly, so they wouldn't break the communication again. @fmpfeifer do you have any recommendation on how to do this?

PS: Another approach I used in the past to communicate with Sleuthkit evidence reading processes was to use sockets instead of stdout/stdin, but it is a bit slower and using stdout/stdin is simpler and cleaner, when possible.

lfcnassif commented 10 months ago

PS2: My redirect approach that didn't work was putting this at the beginning of the python script:

import sys
stdout = sys.stdout
sys.stdout = sys.stderr