wpoa / open-access-media-importer

A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
http://commons.wikimedia.org/wiki/User:Open_Access_Media_Importer_Bot
23 stars 8 forks source link

WebM output #110

Open Daniel-Mietchen opened 11 years ago

Daniel-Mietchen commented 11 years ago

Soon after the bot started operating, Wikimedia Commons started to allow WebM uploads in addition to OGG. I am not too familiar with the new format myself but keep receiving hints that it may be preferred over OGG. So I am opening this one to let us discuss the option.

erlehmann commented 11 years ago

Encoding is easily possible, by replacing theoraenc with vp8enc and oggmux with webmmux. So should Ogg Theora + Vorbis stay as an option or can I just upgrade/replace it?

Daniel-Mietchen commented 11 years ago

Let's stick to Ogg as the default, for now, adding an option to do WebM. I would appreciate pointers as to the relative merits of the two.

erlehmann commented 11 years ago

Both variants would use the same Audio codec: Vorbis. With video codecs, VP8 is simply more advanced than Theora, yielding better compression (at the expense of processing). http://en.wikipedia.org/wiki/VP3#Theora http://en.wikipedia.org/wiki/VP8

erlehmann commented 11 years ago

Another difference is the container format: Ogg vs. Matroska. While Ogg is probably streamable better, I do not think it matters much one way or the other. http://en.wikipedia.org/wiki/Ogg http://en.wikipedia.org/wiki/Matroska

Daniel-Mietchen commented 11 years ago

Yeah, I've had a look at these, but don't feel I know enough to really have a preference.

erlehmann commented 11 years ago

VP8 delivers higher quality than Theora at the same bitrate and browsers that can play Ogg Theora+Vorbis all can play WebM nowadays. From my experience, encoding takes longer – but I do not think this is should be a concern for us.

Relevant: http://www.heise.de/open/meldung/tagesschau-de-Googles-Video-Format-WebM-loest-Ogg-Theora-ab-1539536.html

Daniel-Mietchen commented 11 years ago

What about licensing? http://www.heise.de/open/meldung/Open-Source-Aktivist-kritisiert-Googles-WebM-Lizenz-1866779.html http://www.heise.de/open/meldung/Juristen-Googles-VP8-Lizenz-kollidiert-nicht-mit-FOSS-Projekten-1873371.html

erlehmann commented 11 years ago

To be safe, we should probably just ask Wikimedia people about legal ramifications of WebM use.

mco13 commented 10 years ago

I guess if someone sued google because of patent infringements by their VP8-Codec, the VP3 (ie. theora) would also be affected. Related: http://blog.webmproject.org/2013/08/good-news-from-germany.html

As of the format: The experience showed me that WebM is well-formed container hard to break (i.e. create malformed, corrupt files) and easy to fix if there is any problem. As the name may suggest it is optimized for web-use, streaming etc... Not to mention the superior quality in combination with smaller file sizes. (Of course with the resulting encoding-time overhead which is negligible considering the pros.)

maflcko commented 10 years ago

I don't do python but this might help to create a pull request for helpers/media.py.