Closed Tobe2d closed 1 year ago
Hello, thanks for opening this issue. Based on a quick search, it seems that the issue has to do with the Python and torch versions. Some things that come to my mind:
python -V
) as per here.This isn't an issue with this project per se, but happy to help with your setup. Good luck!
Yes Python 3.9 make it work after installing everything fresh and pytorch with cuda it work Now after running storyteller it work and created: 9 files of .mp4 9 files of .wav 9 files of .png 9 files of .srt
all mp4 are 0kb while wav and png does work how to make the mp4 work?
I could not find anywhere to edit the story, where can I edit the story itself?
Currently, there is no feature that allows users to edit stories, but that's a good suggestion! I'll create a to-do for it.
Seems like there's an issue with video creation via ffmpeg
. Could you perhaps see if the following command works?
ffmpeg -loop 1 -i {image_path} -i {audio_path} -vf subtitles={subtitle_path} -tune stillimage -shortest {video_path}
The image, audio, subtitle, and video paths are the .png
, .wav
, .srt
, and .mp4
files, respectively. Could you see if the video is playable with the appropriate sound, image, and subtitle?
All videos are not playable while photos, audio and subtitle files working individually..
I run the command: (storyteller) E:\AiProject\storyteller>ffmpeg -loop 1 -i E:\AiProject\storyteller\out -i E:\AiProject\storyteller\out -vf subtitles=E:\AiProject\storyteller\out -tune stillimage -shortest E:\Ai__Project\storyteller\out
ffmpeg version 2022-02-28-git-7a4840a8ca-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers built with gcc 11.2.0 (Rev7, Built by MSYS2 project) configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband libavutil 57. 21.100 / 57. 21.100 libavcodec 59. 21.100 / 59. 21.100 libavformat 59. 17.102 / 59. 17.102 libavdevice 59. 5.100 / 59. 5.100 libavfilter 8. 27.100 / 8. 27.100 libswscale 6. 5.100 / 6. 5.100 libswresample 4. 4.100 / 4. 4.100 libpostproc 56. 4.100 / 56. 4.100 E:\Ai__Project\storyteller\out: Permission denied
Hey @Tobe2d, it seems like the image/subtitle/audio files were misspecified. Can you try something like
ffmpeg -loop 1 -i E:\Ai__Project\storyteller\out\1.png -i E:\Ai__Project\storyteller\out\1.wav -vf subtitles=E:\Ai__Project\storyteller\out\1.srt -tune stillimage -shortest E:\Ai__Project\storyteller\out\1.mp4
I've replaced the file paths from ...\out
to ...\out\1.EXTENSION
, where EXTENSION
is one of png
, wav
, srt
, and mp4
.
Thanks @jaketae
I just tested:
ffmpeg -loop 1 -i E:\Ai__Project\storyteller\out\1.png -i E:\Ai__Project\storyteller\out\1.wav -vf subtitles=E:\Ai__Project\storyteller\out\1.srt -tune stillimage -shortest E:\Ai__Project\storyteller\out\1.mp4
And here is the result:
ffmpeg version 2022-02-28-git-7a4840a8ca-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 11.2.0 (Rev7, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil 57. 21.100 / 57. 21.100
libavcodec 59. 21.100 / 59. 21.100
libavformat 59. 17.102 / 59. 17.102
libavdevice 59. 5.100 / 59. 5.100
libavfilter 8. 27.100 / 8. 27.100
libswscale 6. 5.100 / 6. 5.100
libswresample 4. 4.100 / 4. 4.100
libpostproc 56. 4.100 / 56. 4.100
Input #0, png_pipe, from 'E:\Ai__Project\storyteller\out\1.png':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: png, rgb24(pc), 768x768, 25 fps, 25 tbr, 25 tbn
Guessed Channel Layout for Input Stream #1.0 : mono
Input #1, wav, from 'E:\Ai__Project\storyteller\out\1.wav':
Duration: 00:00:07.00, bitrate: 352 kb/s
Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
File 'E:\Ai__Project\storyteller\out\1.mp4' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[subtitles @ 0000024c1cf42580] Unable to parse option value "Ai__Projectstorytellerout1.srt" as image size
Last message repeated 1 times
[subtitles @ 0000024c1cf42580] Error setting option original_size to value Ai__Projectstorytellerout1.srt.
[Parsed_subtitles_0 @ 0000024c1cf45600] Error applying options to the filter.
[AVFilterGraph @ 0000024c1dfabc80] Error initializing filter 'subtitles' with args 'E:Ai__Projectstorytellerout1.srt'
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed!
Here is a screenshot of the output folder:
Hey @Tobe2d, thanks for the report.
I don't have a windows computer so it's hard for me to test, but it seems that the issue might have to do with escape characters. For instance, see this SE post.
Could you try
ffmpeg -loop 1 -i E:\Ai__Project\storyteller\out\1.png -i E:\Ai__Project\storyteller\out\1.wav -vf subtitles='E\:\\Ai__Project\\storyteller\\out\\1.srt' -tune stillimage -shortest E:\Ai__Project\storyteller\out\1.mp4
I've added a slash in front of every special character as suggested in the linked post.
Thanks @jaketae
I tested this:
ffmpeg -loop 1 -i E:\Ai__Project\storyteller\out\1.png -i E:\Ai__Project\storyteller\out\1.wav -vf subtitles='E\:\\Ai__Project\\storyteller\\out\\1.srt' -tune stillimage -shortest E:\Ai__Project\storyteller\out\1.mp4
and the output is:
ffmpeg version 2022-02-28-git-7a4840a8ca-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 11.2.0 (Rev7, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil 57. 21.100 / 57. 21.100
libavcodec 59. 21.100 / 59. 21.100
libavformat 59. 17.102 / 59. 17.102
libavdevice 59. 5.100 / 59. 5.100
libavfilter 8. 27.100 / 8. 27.100
libswscale 6. 5.100 / 6. 5.100
libswresample 4. 4.100 / 4. 4.100
libpostproc 56. 4.100 / 56. 4.100
Input #0, png_pipe, from 'E:\Ai__Project\storyteller\out\1.png':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: png, rgb24(pc), 768x768, 25 fps, 25 tbr, 25 tbn
Guessed Channel Layout for Input Stream #1.0 : mono
Input #1, wav, from 'E:\Ai__Project\storyteller\out\1.wav':
Duration: 00:00:07.00, bitrate: 352 kb/s
Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
File 'E:\Ai__Project\storyteller\out\1.mp4' already exists. Overwrite? [y/N] y
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[Parsed_subtitles_0 @ 0000018f1eb24900] libass API version: 0x1502001
[Parsed_subtitles_0 @ 0000018f1eb24900] libass source: commit: 0.15.2-62-gba6bcb3a9c2f06272ca1ff1a65f52dc5bc4528b0
[Parsed_subtitles_0 @ 0000018f1eb24900] Shaper: FriBidi 1.0.11 (SIMPLE) HarfBuzz-ng 3.4.0 (COMPLEX)
[Parsed_subtitles_0 @ 0000018f1eb24900] Using font provider directwrite (with GDI)
[Parsed_subtitles_0 @ 0000018f1eb24900] fontselect: (Arial, 400, 0) -> ArialMT, 0, ArialMT
[libx264 @ 0000018f1eb12340] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0000018f1eb12340] profile High 4:4:4 Predictive, level 3.1, 4:4:4, 8-bit
[libx264 @ 0000018f1eb12340] 264 - core 164 r3094 bfc87b7 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:-3:-3 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=2.00:0.70 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=2 threads=24 lookahead_threads=4 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.20
Output #0, mp4, to 'E:\Ai__Project\storyteller\out\1.mp4':
Metadata:
encoder : Lavf59.17.102
Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv444p(tv, progressive), 768x768, q=2-31, 25 fps, 12800 tbn
Metadata:
encoder : Lavc59.21.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 22050 Hz, mono, fltp, 69 kb/s
Metadata:
encoder : Lavc59.21.100 aac
frame= 243 fps=0.0 q=-1.0 Lsize= 300kB time=00:00:09.60 bitrate= 256.2kbits/s speed=16.7x
video:236kB audio:58kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.249609%
[libx264 @ 0000018f1eb12340] frame I:1 Avg QP:17.70 size:115143
[libx264 @ 0000018f1eb12340] frame P:61 Avg QP:16.34 size: 1809
[libx264 @ 0000018f1eb12340] frame B:181 Avg QP:24.34 size: 85
[libx264 @ 0000018f1eb12340] consecutive B-frames: 0.4% 0.8% 0.0% 98.8%
[libx264 @ 0000018f1eb12340] mb I I16..4: 4.3% 63.3% 32.4%
[libx264 @ 0000018f1eb12340] mb P I16..4: 0.0% 0.1% 0.0% P16..4: 5.4% 0.2% 0.5% 0.0% 0.0% skip:93.8%
[libx264 @ 0000018f1eb12340] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 2.8% 0.0% 0.0% direct: 0.0% skip:97.1% L0:68.4% L1:31.5% BI: 0.1%
[libx264 @ 0000018f1eb12340] 8x8 transform intra:65.3% inter:87.7%
[libx264 @ 0000018f1eb12340] coded y,u,v intra: 99.7% 78.6% 80.7% inter: 0.8% 0.2% 0.2%
[libx264 @ 0000018f1eb12340] i16 v,h,dc,p: 0% 19% 20% 62%
[libx264 @ 0000018f1eb12340] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 6% 29% 17% 6% 7% 6% 11% 5% 13%
[libx264 @ 0000018f1eb12340] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 11% 7% 10% 11% 11% 6% 11%
[libx264 @ 0000018f1eb12340] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0000018f1eb12340] ref P L0: 51.8% 0.6% 45.2% 2.4%
[libx264 @ 0000018f1eb12340] ref B L0: 62.8% 36.1% 1.1%
[libx264 @ 0000018f1eb12340] ref B L1: 97.6% 2.4%
[libx264 @ 0000018f1eb12340] kb/s:198.28
[aac @ 0000018f1edbc040] Qavg: 6274.611
And here is the video 1.mp4
https://user-images.githubusercontent.com/4099839/211865115-d38775bf-7047-478d-9354-438891818e84.mp4
Thanks for being patient. From experience, GitHub isn't good at displaying videos. Can you play it locally on your computer?
Thank you so much for patience and this cool project. Yes it is playing nicely here ;-) This is the only video generated so far. Now how to get it to make the full story ;-) and where to add my own prompts?
Hello @Tobe2d, sincere apologies for the belated reply.
I don't have a Windows machine, which makes it difficult for me to test and push out a fix. However, I will try to get it working ASAP and let you know when a testable fix is ready.
In the meantime, you can repeat the process above for the rest of the image/audio pairs (2.png, 2.wav, 3.png, 3.wav...
), then stitch together the output files via ffmpeg
or a video editor of your choice. If you are comfortable writing Python code, you can also try running
from storyteller import StoryTeller
story_teller = StoryTeller.from_default()
# might have to adjust the video path
video_paths = ["out\{i}.mp4" for i in range(10)]
story_teller.concat_videos(video_paths)
Let me know if this makes sense. Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I am getting the error below while I hace installed numpy-1.22.4
storyteller [nltk_data] Downloading package punkt to [nltk_data] C:\Users\xxx\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date! Fetching 16 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 5335.42it/s] Traceback (most recent call last): File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\Scripts\storyteller.exe__main.py", line 7, in
File "E:\Ai Project\storyteller\storyteller__main.py", line 18, in main
story_teller = StoryTeller.from_default()
File "E:\AiProject\storyteller\storyteller\model.py", line 46, in from_default
return cls(config)
File "E:\AiProject\storyteller\storyteller\model.py", line 34, in init
self.painter = StableDiffusionPipeline.from_pretrained(
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\pipeline_utils.py", line 708, in from_pretrained
loaded_sub_model = load_method(os.path.join(cached_folder, name), loading_kwargs)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\schedulers\scheduling_utils.py", line 124, in from_pretrained
return cls.from_config(config, return_unused_kwargs=return_unused_kwargs, kwargs)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\configuration_utils.py", line 210, in from_config
model = cls(*init_dict)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\configuration_utils.py", line 567, in inner_init
init(self, args, **init_kwargs)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\schedulers\scheduling_ddim.py", line 170, in init__
self.timesteps = torch.from_numpy(np.arange(0, num_train_timesteps)[::-1].copy().astype(np.int64))
RuntimeError: Numpy is not available