Open tig3rmast3r opened 1 year ago
just as a note dont know if is related, i had to manually install madmom cloning from its git (else it gives error in win10), also i had to upgrade numpy to version 23 cause with 22 i was getting errors on startup.
hmm, I feel like this could either be version problem with madmom or sndfile, since onset detection happens through madmom. what's the full call stack? is this libsndfile error happening inside madmom?
Here's the full stack printed from powershell, looks like "file not found" Traceback (most recent call last): File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\gradio\routes.py", line 442, in run_predict output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\gradio\blocks.py", line 1389, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\gradio\blocks.py", line 1094, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\gradio\utils.py", line 703, in wrapper response = f(args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\vampnet\app.py", line 219, in vamp return _vamp(data, return_mask=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\vampnet\app.py", line 136, in _vamp mask, pmask.onset_mask(sig, z, interface, width=data[onset_mask_width]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\vampnet\vampnet\mask.py", line 201, in onset_mask sig.write(f.name) File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\audiotools\core\audio_signal.py", line 602, in write soundfile.write(str(audio_path), self.audio_data[0].numpy().T, self.sample_rate) File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 343, in write with SoundFile(file, 'w', samplerate, channels, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 658, in init self._file = self._open(file, mode_int, closefd) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\xxxxx\AppData\Roaming\Python\Python311\site-packages\soundfile.py", line 1216, in _open raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name)) soundfile.LibsndfileError: Error opening 'C:\Users\xxxxx\AppData\Local\Temp\tmpcfwf7po5.wav': System error. INFO:httpx:HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 500 Internal Server Error" INFO:httpx:HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
Hmm, I don't have a windows machine to debug on atm, but it looks like it's failing to write the input audio file to a temp directory for onset processing:
File "C:\Users\xxxxx\vampnet\vampnet\mask.py", line 201, in onset_mask
sig.write(f.name)
The way f.name
is created is here:
https://github.com/hugofloresgarcia/vampnet/blob/a66dc9cb8aa8494f8d8ed53ac1e5bf99a6d6483e/vampnet/mask.py#L199
this could be it: https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file
looks like we're trying to open the file twice: once when NamedTemporaryFile()
is created, and another in sig.write
.
This solution from stackoverflow could work, you could give it a try! I'm happy to accept a PR!
import os
import tempfile
class CustomNamedTemporaryFile:
"""
This custom implementation is needed because of the following limitation of tempfile.NamedTemporaryFile:
> Whether the name can be used to open the file a second time, while the named temporary file is still open,
> varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
"""
def __init__(self, mode='wb', delete=True):
self._mode = mode
self._delete = delete
def __enter__(self):
# Generate a random temporary file name
file_name = os.path.join(tempfile.gettempdir(), os.urandom(24).hex())
# Ensure the file is created
open(file_name, "x").close()
# Open the file in the given mode
self._tempFile = open(file_name, self._mode)
return self._tempFile
def __exit__(self, exc_type, exc_val, exc_tb):
self._tempFile.close()
if self._delete:
os.remove(self._tempFile.name)
Hi there,
i've just modified this line
https://github.com/hugofloresgarcia/vampnet/blob/a66dc9cb8aa8494f8d8ed53ac1e5bf99a6d6483e/vampnet/mask.py#L199
with
with tempfile.NamedTemporaryFile(suffix='.wav',delete=False) as f:
and it works!
it's probably going to grow temp folder overtime, not that clever solution but given my near 0 python knowledge i'm ok with this for now :)
Thanks for the hint!
I have a question, assuming i want to create my own mask, i would like to make that when i click generate instead of creating the mask file it will load a mask.wav file from vampnet\assets folder, would you be so kind to point me where i should act in the code more or less ? is that even possible ? i mean, the mask is just the input audio with muted parts that will be inpainted or is a more complex operation ?
The mask is not the audio with muted parts (though we can represent the mask as that).
A better way to think of the mask is an array with 1s in the timesteps where we want to generate audio and 0s in the timesteps where we want conditioning. Note that the "width" of these time steps depends on the tokenizer's hop length.
You can get the tokenizer hop_length
using interface.codec.hop_length
, which will give you the tokenizer's hop size in samples.
you could try something like checking if most (or all) samples in a given chunk of hop_length
samples are equal to.
this could be a good starting point, though it's not tested:
def audio_file_mask(
sig: AudioSignal,
z: torch.Tensor,
interface,
):
"""
create a mask from an audio file.
where muted sections (where samples == 0 on a given hop length) equal 1s, in the mask,
and nonmuted sections equal to 0s in the mask.
"""
# get the number of samples in a hop
hop_length = interface.codec.hop_length
# get the number of timesteps in the z array
n_steps = z.shape[-1]
# create a mask, set muted sections to 1
mask = torch.zeros_like(z)
for i in range(n_steps):
# get the start and end indices for the hop
start = i * hop_length
end = (i + 1) * hop_length
# if all samples in the hop are 0, then we have a muted section
# checking the first channel only!
if torch.all(sig.samples[0, 0, start:end] == 0):
mask[:, :, i] = 1
return mask
if __name__ == "__main__":
sig = AudioSignal("mask.wav")
interface: Interface # initialize an interface here
sig = interface.preprocess(sig)
z = interface.encode(sig)
mask = audio_file_mask(sig, z, interface=interface)
i've managed to let it work on Windows 10 but if i try to generate audio with the oneset mask slider higher than 0 i get this error: soundfile.LibsndfileError: Error opening 'C:\Users\xxxx\AppData\Local\Temp\tmptlcxs7uq.wav': System error.
With slider at 0 everything works and is amazing, i tested all presets and i only have to move that slider to 0 if is not already.
any clue ? thanks