jaketae / storyteller

Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech
MIT License
482 stars 64 forks source link

RuntimeError: Numpy is not available #4

Closed Tobe2d closed 1 year ago

Tobe2d commented 1 year ago

I am getting the error below while I hace installed numpy-1.22.4

storyteller [nltk_data] Downloading package punkt to [nltk_data] C:\Users\xxx\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date! Fetching 16 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 5335.42it/s] Traceback (most recent call last): File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\Scripts\storyteller.exe__main.py", line 7, in File "E:\AiProject\storyteller\storyteller__main.py", line 18, in main story_teller = StoryTeller.from_default() File "E:\AiProject\storyteller\storyteller\model.py", line 46, in from_default return cls(config) File "E:\AiProject\storyteller\storyteller\model.py", line 34, in init self.painter = StableDiffusionPipeline.from_pretrained( File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\pipeline_utils.py", line 708, in from_pretrained loaded_sub_model = load_method(os.path.join(cached_folder, name), loading_kwargs) File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\schedulers\scheduling_utils.py", line 124, in from_pretrained return cls.from_config(config, return_unused_kwargs=return_unused_kwargs, kwargs) File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\configuration_utils.py", line 210, in from_config model = cls(*init_dict) File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\configuration_utils.py", line 567, in inner_init init(self, args, **init_kwargs) File "C:\Users\xxx\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\schedulers\scheduling_ddim.py", line 170, in init__ self.timesteps = torch.from_numpy(np.arange(0, num_train_timesteps)[::-1].copy().astype(np.int64)) RuntimeError: Numpy is not available

jaketae commented 1 year ago

Hello, thanks for opening this issue. Based on a quick search, it seems that the issue has to do with the Python and torch versions. Some things that come to my mind:

  1. Make sure you have the most up-to-date version of numpy installed.
  2. Downgrade your Python version to 3.9 (check python -V) as per here.
  3. Try a different torch version as per here.

This isn't an issue with this project per se, but happy to help with your setup. Good luck!

Tobe2d commented 1 year ago

Yes Python 3.9 make it work after installing everything fresh and pytorch with cuda it work Now after running storyteller it work and created: 9 files of .mp4 9 files of .wav 9 files of .png 9 files of .srt

all mp4 are 0kb while wav and png does work how to make the mp4 work?

I could not find anywhere to edit the story, where can I edit the story itself?

jaketae commented 1 year ago

Currently, there is no feature that allows users to edit stories, but that's a good suggestion! I'll create a to-do for it.

Seems like there's an issue with video creation via ffmpeg. Could you perhaps see if the following command works?

ffmpeg -loop 1 -i {image_path} -i {audio_path} -vf subtitles={subtitle_path} -tune stillimage -shortest {video_path}

The image, audio, subtitle, and video paths are the .png, .wav, .srt, and .mp4 files, respectively. Could you see if the video is playable with the appropriate sound, image, and subtitle?

Tobe2d commented 1 year ago

All videos are not playable while photos, audio and subtitle files working individually..

I run the command: (storyteller) E:\AiProject\storyteller>ffmpeg -loop 1 -i E:\AiProject\storyteller\out -i E:\AiProject\storyteller\out -vf subtitles=E:\AiProject\storyteller\out -tune stillimage -shortest E:\Ai__Project\storyteller\out

ffmpeg version 2022-02-28-git-7a4840a8ca-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers built with gcc 11.2.0 (Rev7, Built by MSYS2 project) configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband libavutil 57. 21.100 / 57. 21.100 libavcodec 59. 21.100 / 59. 21.100 libavformat 59. 17.102 / 59. 17.102 libavdevice 59. 5.100 / 59. 5.100 libavfilter 8. 27.100 / 8. 27.100 libswscale 6. 5.100 / 6. 5.100 libswresample 4. 4.100 / 4. 4.100 libpostproc 56. 4.100 / 56. 4.100 E:\Ai__Project\storyteller\out: Permission denied

jaketae commented 1 year ago

Hey @Tobe2d, it seems like the image/subtitle/audio files were misspecified. Can you try something like

ffmpeg -loop 1 -i E:\Ai__Project\storyteller\out\1.png -i E:\Ai__Project\storyteller\out\1.wav -vf subtitles=E:\Ai__Project\storyteller\out\1.srt -tune stillimage -shortest E:\Ai__Project\storyteller\out\1.mp4

I've replaced the file paths from ...\out to ...\out\1.EXTENSION, where EXTENSION is one of png, wav, srt, and mp4.

Tobe2d commented 1 year ago

Thanks @jaketae

I just tested: ffmpeg -loop 1 -i E:\Ai__Project\storyteller\out\1.png -i E:\Ai__Project\storyteller\out\1.wav -vf subtitles=E:\Ai__Project\storyteller\out\1.srt -tune stillimage -shortest E:\Ai__Project\storyteller\out\1.mp4

And here is the result:

ffmpeg version 2022-02-28-git-7a4840a8ca-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 11.2.0 (Rev7, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      57. 21.100 / 57. 21.100
  libavcodec     59. 21.100 / 59. 21.100
  libavformat    59. 17.102 / 59. 17.102
  libavdevice    59.  5.100 / 59.  5.100
  libavfilter     8. 27.100 /  8. 27.100
  libswscale      6.  5.100 /  6.  5.100
  libswresample   4.  4.100 /  4.  4.100
  libpostproc    56.  4.100 / 56.  4.100
Input #0, png_pipe, from 'E:\Ai__Project\storyteller\out\1.png':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: png, rgb24(pc), 768x768, 25 fps, 25 tbr, 25 tbn
Guessed Channel Layout for Input Stream #1.0 : mono
Input #1, wav, from 'E:\Ai__Project\storyteller\out\1.wav':
  Duration: 00:00:07.00, bitrate: 352 kb/s
  Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
File 'E:\Ai__Project\storyteller\out\1.mp4' already exists. Overwrite? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
  Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[subtitles @ 0000024c1cf42580] Unable to parse option value "Ai__Projectstorytellerout1.srt" as image size
    Last message repeated 1 times
[subtitles @ 0000024c1cf42580] Error setting option original_size to value Ai__Projectstorytellerout1.srt.
[Parsed_subtitles_0 @ 0000024c1cf45600] Error applying options to the filter.
[AVFilterGraph @ 0000024c1dfabc80] Error initializing filter 'subtitles' with args 'E:Ai__Projectstorytellerout1.srt'
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Conversion failed!
Tobe2d commented 1 year ago

Here is a screenshot of the output folder: image

jaketae commented 1 year ago

Hey @Tobe2d, thanks for the report.

I don't have a windows computer so it's hard for me to test, but it seems that the issue might have to do with escape characters. For instance, see this SE post.

Could you try

ffmpeg -loop 1 -i E:\Ai__Project\storyteller\out\1.png -i E:\Ai__Project\storyteller\out\1.wav -vf subtitles='E\:\\Ai__Project\\storyteller\\out\\1.srt' -tune stillimage -shortest E:\Ai__Project\storyteller\out\1.mp4

I've added a slash in front of every special character as suggested in the linked post.

Tobe2d commented 1 year ago

Thanks @jaketae

I tested this: ffmpeg -loop 1 -i E:\Ai__Project\storyteller\out\1.png -i E:\Ai__Project\storyteller\out\1.wav -vf subtitles='E\:\\Ai__Project\\storyteller\\out\\1.srt' -tune stillimage -shortest E:\Ai__Project\storyteller\out\1.mp4

and the output is:

ffmpeg version 2022-02-28-git-7a4840a8ca-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 11.2.0 (Rev7, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      57. 21.100 / 57. 21.100
  libavcodec     59. 21.100 / 59. 21.100
  libavformat    59. 17.102 / 59. 17.102
  libavdevice    59.  5.100 / 59.  5.100
  libavfilter     8. 27.100 /  8. 27.100
  libswscale      6.  5.100 /  6.  5.100
  libswresample   4.  4.100 /  4.  4.100
  libpostproc    56.  4.100 / 56.  4.100
Input #0, png_pipe, from 'E:\Ai__Project\storyteller\out\1.png':
  Duration: N/A, bitrate: N/A
  Stream #0:0: Video: png, rgb24(pc), 768x768, 25 fps, 25 tbr, 25 tbn
Guessed Channel Layout for Input Stream #1.0 : mono
Input #1, wav, from 'E:\Ai__Project\storyteller\out\1.wav':
  Duration: 00:00:07.00, bitrate: 352 kb/s
  Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
File 'E:\Ai__Project\storyteller\out\1.mp4' already exists. Overwrite? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
  Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[Parsed_subtitles_0 @ 0000018f1eb24900] libass API version: 0x1502001
[Parsed_subtitles_0 @ 0000018f1eb24900] libass source: commit: 0.15.2-62-gba6bcb3a9c2f06272ca1ff1a65f52dc5bc4528b0
[Parsed_subtitles_0 @ 0000018f1eb24900] Shaper: FriBidi 1.0.11 (SIMPLE) HarfBuzz-ng 3.4.0 (COMPLEX)
[Parsed_subtitles_0 @ 0000018f1eb24900] Using font provider directwrite (with GDI)
[Parsed_subtitles_0 @ 0000018f1eb24900] fontselect: (Arial, 400, 0) -> ArialMT, 0, ArialMT
[libx264 @ 0000018f1eb12340] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0000018f1eb12340] profile High 4:4:4 Predictive, level 3.1, 4:4:4, 8-bit
[libx264 @ 0000018f1eb12340] 264 - core 164 r3094 bfc87b7 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:-3:-3 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=2.00:0.70 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=2 threads=24 lookahead_threads=4 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.20
Output #0, mp4, to 'E:\Ai__Project\storyteller\out\1.mp4':
  Metadata:
    encoder         : Lavf59.17.102
  Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv444p(tv, progressive), 768x768, q=2-31, 25 fps, 12800 tbn
    Metadata:
      encoder         : Lavc59.21.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
  Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 22050 Hz, mono, fltp, 69 kb/s
    Metadata:
      encoder         : Lavc59.21.100 aac
frame=  243 fps=0.0 q=-1.0 Lsize=     300kB time=00:00:09.60 bitrate= 256.2kbits/s speed=16.7x
video:236kB audio:58kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.249609%
[libx264 @ 0000018f1eb12340] frame I:1     Avg QP:17.70  size:115143
[libx264 @ 0000018f1eb12340] frame P:61    Avg QP:16.34  size:  1809
[libx264 @ 0000018f1eb12340] frame B:181   Avg QP:24.34  size:    85
[libx264 @ 0000018f1eb12340] consecutive B-frames:  0.4%  0.8%  0.0% 98.8%
[libx264 @ 0000018f1eb12340] mb I  I16..4:  4.3% 63.3% 32.4%
[libx264 @ 0000018f1eb12340] mb P  I16..4:  0.0%  0.1%  0.0%  P16..4:  5.4%  0.2%  0.5%  0.0%  0.0%    skip:93.8%
[libx264 @ 0000018f1eb12340] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8:  2.8%  0.0%  0.0%  direct: 0.0%  skip:97.1%  L0:68.4% L1:31.5% BI: 0.1%
[libx264 @ 0000018f1eb12340] 8x8 transform intra:65.3% inter:87.7%
[libx264 @ 0000018f1eb12340] coded y,u,v intra: 99.7% 78.6% 80.7% inter: 0.8% 0.2% 0.2%
[libx264 @ 0000018f1eb12340] i16 v,h,dc,p:  0% 19% 20% 62%
[libx264 @ 0000018f1eb12340] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu:  6% 29% 17%  6%  7%  6% 11%  5% 13%
[libx264 @ 0000018f1eb12340] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 11%  7% 10% 11% 11%  6% 11%
[libx264 @ 0000018f1eb12340] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0000018f1eb12340] ref P L0: 51.8%  0.6% 45.2%  2.4%
[libx264 @ 0000018f1eb12340] ref B L0: 62.8% 36.1%  1.1%
[libx264 @ 0000018f1eb12340] ref B L1: 97.6%  2.4%
[libx264 @ 0000018f1eb12340] kb/s:198.28
[aac @ 0000018f1edbc040] Qavg: 6274.611

And here is the video 1.mp4

https://user-images.githubusercontent.com/4099839/211865115-d38775bf-7047-478d-9354-438891818e84.mp4

jaketae commented 1 year ago

Thanks for being patient. From experience, GitHub isn't good at displaying videos. Can you play it locally on your computer?

Tobe2d commented 1 year ago

Thank you so much for patience and this cool project. Yes it is playing nicely here ;-) This is the only video generated so far. Now how to get it to make the full story ;-) and where to add my own prompts?

jaketae commented 1 year ago

Hello @Tobe2d, sincere apologies for the belated reply.

I don't have a Windows machine, which makes it difficult for me to test and push out a fix. However, I will try to get it working ASAP and let you know when a testable fix is ready.

In the meantime, you can repeat the process above for the rest of the image/audio pairs (2.png, 2.wav, 3.png, 3.wav...), then stitch together the output files via ffmpeg or a video editor of your choice. If you are comfortable writing Python code, you can also try running

from storyteller import StoryTeller
story_teller = StoryTeller.from_default()
# might have to adjust the video path
video_paths  = ["out\{i}.mp4" for i in range(10)]
story_teller.concat_videos(video_paths)

Let me know if this makes sense. Thanks!

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.