Closed SoftologyPro closed 12 months ago
I appreciate the update. The model parameter you should use is the pre-trained text-to-video model that we fine-tuned. The model compatible with the provided mapper_weights is "cerspense/zeroscope_v2_576w". I've set it as the default parameter, so feel free to give it a try now.
Getting further, but now...
File "D:\Tests\TempoTokens\inference.py", line 519, in <module>
videos = inference(
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Tests\TempoTokens\inference.py", line 400, in inference
prompt_embeds, negative_prompt_embeds = compel(prompt), compel(negative_prompt) if negative_prompt else None
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\compel\compel.py", line 135, in __call__
output = self.build_conditioning_tensor(text_input)
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\compel\compel.py", line 112, in build_conditioning_tensor
conditioning, _ = self.build_conditioning_tensor_for_conjunction(conjunction)
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\compel\compel.py", line 186, in build_conditioning_tensor_for_conjunction
this_conditioning, this_options = self.build_conditioning_tensor_for_prompt_object(p)
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\compel\compel.py", line 218, in build_conditioning_tensor_for_prompt_object
return self._get_conditioning_for_flattened_prompt(prompt), {}
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\compel\compel.py", line 282, in _get_conditioning_for_flattened_prompt
return self.conditioning_provider.get_embeddings_for_weighted_prompt_fragments(
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\compel\embeddings_provider.py", line 120, in get_embeddings_for_weighted_prompt_fragments
base_embedding = self.build_weighted_embedding_tensor(tokens, per_token_weights, mask, device=device)
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\compel\embeddings_provider.py", line 371, in build_weighted_embedding_tensor
z = self._encode_token_ids_to_embeddings(chunk_token_ids, chunk_attention_mask)
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\compel\embeddings_provider.py", line 390, in _encode_token_ids_to_embeddings
text_encoder_output = self.text_encoder(token_ids,
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Tests\TempoTokens\modules\text_encoder\modeling_clip_temp_token.py", line 855, in forward
return self.text_model(
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Tests\TempoTokens\modules\text_encoder\modeling_clip_temp_token.py", line 760, in forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids, audio_token=audio_token, temp_token=temp_token, local_windows=local_windows)
File "D:\Tests\TempoTokens\voc_tempotokens\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Tests\TempoTokens\modules\text_encoder\modeling_clip_temp_token.py", line 232, in forward
indices = torch.where(input_ids == 49408)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Using Torch 2.0.1, ie
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2+cu118 --index-url https://download.pytorch.org/whl/cu118
The token embedding wasn't resizing properly during inference, leading to the CUDA error. I've fixed this issue, so you can now generate videos without encountering that error. Thanks for reporting it!
Yes, all good now. Thanks.
Your example
inference.py --mapper_weights models/vggsound/learned_embeds.pth --audio_path /audio/path
I try the followingpython inference.py --mapper_weights models\vggsound\learned_embeds.pth --audio_path croaking.mp3
which gives this errorWhat do I need to specify for the --model parameter? Thanks for any tips.