Closed zanussbaum closed 1 year ago
Hi @zanussbaum
This looks like data-dependent issue. The code path is tested in many places, and I have never seen this. Any chance you can share the data?
@mthrok Hm that's weird. I'm able to load the same file in torchaudio
when using 1.13 but not on this latest version.
I've tested it by running the example in ImageBind and I am able to get a forward pass to work with 1.13 but not 2.1 for the following code
import data
import torch
from models import imagebind_model
from models.imagebind_model import ModalityType
text_list=["A dog.", "A car", "A bird"]
image_paths=[".assets/dog_image.jpg", ".assets/car_image.jpg", ".assets/bird_image.jpg"]
audio_paths=[".assets/dog_audio.wav", ".assets/car_audio.wav", ".assets/bird_audio.wav"]
device = "cuda:0" if torch.cuda.is_available() else "cpu"
# Instantiate model
model = imagebind_model.imagebind_huge(pretrained=True)
model.eval()
model.to(device)
# Load data
inputs = {
ModalityType.TEXT: data.load_and_transform_text(text_list, device),
ModalityType.VISION: data.load_and_transform_vision_data(image_paths, device),
ModalityType.AUDIO: data.load_and_transform_audio_data(audio_paths, device),
}
with torch.no_grad():
embeddings = model(inputs)
print(
"Vision x Text: ",
torch.softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.TEXT].T, dim=-1),
)
print(
"Audio x Text: ",
torch.softmax(embeddings[ModalityType.AUDIO] @ embeddings[ModalityType.TEXT].T, dim=-1),
)
print(
"Vision x Audio: ",
torch.softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.AUDIO].T, dim=-1),
)
Although I should note that to get this working with torchaudio 1.13, I had to switch machines as the H100s (IIUC) use a different instruction set and require torch 2.0.0 > to use the GPUs effectively. But I was using the same audio file in the repo above
Hm this now seems to work with
Versions of relevant libraries:
[pip3] numpy==1.24.1
[pip3] pytorch-triton==2.1.0+440fd1bf20
[pip3] torch==2.1.0.dev20230709+cu121
[pip3] torchaudio==2.1.0.dev20230709+cu121
[pip3] torchvision==0.16.0.dev20230709+cu121
[conda] Could not collect
Closing as this seems to be resolved
This needs to be reopened. Facing the same issue on multiple systems, even with the versions mentioned above. Here is one my files: 00000_mixture.flac.tar.gz
Both torchaudio.load
and torchaudio.info
cause the segmentation fault:
import torchaudio
torchaudio.info('00000_mixture.flac') # Segmentation fault
I have no problem with earlier nightly versions, e.g. this works fine:
torch==2.1.0.dev20230508+cu121
torchaudio==2.1.0.dev20230508+cu121
torchvision==0.16.0.dev20230508+cu121
I did not try and find which nightly version introduced the bug though.
Mmh nevermind, latest nightly builds works for me. Weird.
🐛 Describe the bug
returns
Segmentation fault (core dumped)
Running
gdb --args python -c "import torchaudio; torchaudio.load('.assets/bird_audio.wav')"
shows the following stack trackVersions
Here is my
ffmpeg
version also as it seems that in #3411 that it needs to be < 5