msveshnikov / allchat

AI chat client
https://allchat.online
MIT License
144 stars 15 forks source link

support for other multamodilities...? #7

Closed theone2277 closed 3 months ago

theone2277 commented 3 months ago

can you add videos support by using ffmpeg to split frames and also support for other pdf, word documents or files?

theone2277 commented 3 months ago

wanna say, i really appreciate you for sharing the code for images!

msveshnikov commented 3 months ago

Sure this is on my plans. By the way PDFs and Word documents are allowed for upload

msveshnikov commented 3 months ago

Ok, update from my side: No need in ffmpeg and frame extraction as Files API already has video mime type upload and it works perfectly in AI Studio. I added mp4 upload to FE & BE but still get 400 from API. Investigating

msveshnikov commented 3 months ago

Last Update from Google: https://ai.google.dev/tutorials/prompting_with_media/?utm_source=gais&utm_medium=email&utm_campaign=geminipp

Video formats You can use video data for prompting with the gemini-1.5-pro model. However, video file formats are not supported as direct inputs by the Gemini API. You can use video data as prompt input by breaking down the video into a series of still frame images and a separate audio file. This approach lets you manage the amount of data, and the level of detail provided by the video, by choosing how many frames per second are included in your prompt from the video file.

So video.mp4 will NOT be supported by API. Splitting by ffmpeg is in progress

msveshnikov commented 3 months ago

Implemented, sorry for delay (I was in Lisbon 🥇 )