superseriousbusiness / gotosocial

Fast, fun, small ActivityPub server.
https://docs.gotosocial.org
GNU Affero General Public License v3.0
3.57k stars 300 forks source link

[feature] Video Support #1089

Closed theSuess closed 1 year ago

theSuess commented 1 year ago

Since I'd really like to post some videos, I'm working on implementing this right now. I've created this issue as grounds for discussion on the implementation and feature set.

IMHO integrating with ffmpeg is required in some way as there are no standalone go libraries to work with various video formats.

Open Topics:

theSuess commented 1 year ago

As a reference: I've built a small proof-of-concept by executing ffmpeg in a subprocess which can be found here: https://github.com/superseriousbusiness/gotosocial/compare/main...theSuess:gotosocial:video-support

NyaaaWhatsUpDoc commented 1 year ago

Regarding implementation details, we're very hesitant to require CGO and hooking into ffmpeg bindings. It just makes building much more complex, and CGO calls come with a somewhat performance penalty compared to regular Go. Personally I was hoping to look into transpiling necessary ffmpeg libraries in a similar manner to modernc.org/sqlite, but obviously if you beat me to a solution then that's wonderful :D.

theSuess commented 1 year ago

I think transpiring ffmpeg is a huge endeavor - even if we only do the calls required (extracting Thumbnail + fetching size information), mainly because of the many codecs/options hidden inside a "simple" video file. But it's just a hunch based on my previous experience with audio/visual stuff so maybe it's easier than I imagine. Would love to be proven wrong

hikari-no-yume commented 1 year ago

I'd be interested in this feature. Personally, most of the video I upload is clips that I've already cut down to size, so if GoToSocial would have a configuration where it doesn't attempt to do any re-encoding itself and just takes the file as it is, that'd work great for me. There are actually social platforms out there that work like this: so far as I can tell, if you post a video file in a Discord message, it doesn't try to re-encode it.

theSuess commented 1 year ago

We'd still need a video parsing library to extract a thumbnail and get the video dimensions

hikari-no-yume commented 1 year ago

Ah, does the ActivityPub server have to provide a thumbnail?

hikari-no-yume commented 1 year ago

For what it's worth, executing the ffmpeg CLI tool seems like a reasonable way to do things to me. It's easy to get hold of on any platform (in my experience) and I'm sure there's hundreds of tools out there that are wrappers around it.

tsmethurst commented 1 year ago

Ah, does the ActivityPub server have to provide a thumbnail?

Not necessarily -- it just provides a link. But the Mastodon API we implement has stuff in it like thumbnail, length, frame rate, etc

tsmethurst commented 1 year ago

I would also like to be able to find a solution to this that doesn't require CGO or calling the ffmpeg binary, if possible, to keep packaging + dependencies simpler. I think it would be interesting to try transpiling ffmpeg with the modernc go transpiling libraries. If there's a go-native tool to just extract data like frame rate + length and that sort of thing without decoding the file, all the better. I think we could even ditch thumbnailing and just provide a generic image for every video, if necessary.

hikari-no-yume commented 1 year ago

The annoying thing is codec support I think. A video file is several formats at the same time: a container format (e.g. MP4 or MKV), a video stream (e.g. h.264, VP9, AV1…), and an audio stream (e.g. AAC, Ogg Vorbis, Opus, raw PCM, MP3…). Each of those formats you need to support is its own library, and that's why ffmpeg is such a huge project and probably a huge amount of effort to port/transpile/whatever. For reference, I think a modern web browser supports at least three video codecs, four audio codecs and two or three container formats, and that's quite restrictive compared to the broader landscape of videos people might want to upload.

I bet there's a simple pure Go library out there that can extract the basic metadata from common container formats (MP4/MOV and MKV/WebM might be enough), but unfortunately, while you could get the FPS, frame size and video length that way, getting a thumbnail requires full video codec implementations.

I hope I'm wrong, but I doubt there's a nice pure Go solution to this. And that's not getting into the legal issues with video codec support, ffmpeg and VLC are infamous for possibly not being fully legal in the US.

tsmethurst commented 1 year ago

Ah thanks for the info :)

So I think my order of preference would be:

  1. native go decoding of video - not feasible for reasons discussed above
  2. native go decoding of just the metadata that we need, and using some generic placeholder for video thumbnail
  3. calling ffmpeg or related tool to decode + get thumbnail

edit: we could also consider taking a progressive enhancement approach: look for ffmpeg in $PATH or provided as config variable to do option 3, and if it's not present then fall back to option 2

and again just to reiterate, this isn't because i'm opposed to ffmpeg or anything like that, it's purely to keep dependencies + packaging as simple as possible; GtS being just one binary and some static assets with no dependencies is a real big plus in my view

doenietzomoeilijk commented 1 year ago

Fallback (or user configurable option, "path to ffmpeg/ffprobe binary") with a default to a simpler native solution would be nice, IMO.

It's also something that allows you to roll out the feature in two parts:

That way you have a path towards extra functionality, without having it depend on a fair amount of now-work which means it'd get pushed back on the to-do list or would come at the cost of some other feature.

theSuess commented 1 year ago

I've manged to extract some mp4 metadata using https://github.com/abema/go-mp4 :tada:

I'll try to put together the progressive enhancement flow in a PR