toolforge / video2commons

Transfer video and audio from external sites to Commons.
https://video2commons.toolforge.org/
GNU General Public License v3.0
43 stars 24 forks source link

Support for subtitles import #148

Open JimKillock opened 2 years ago

JimKillock commented 2 years ago

I could not see when using this whether there is support for preserving and importing subtitles, or how this is done.

royjor2016 commented 2 years ago

using this link https://commons.wikimedia.org/w/index.php?title=Special:Search&limit=500&offset=6500&ns102=1&sort=create_timestamp_desc&search=srt

i found https://commons.wikimedia.org/w/index.php?title=TimedText:Magnum,_Our_Own_Agency.webm.fa.srt&action=history , which is probably the last timedtext file imported by video2commons (with the tag), at 02:39, 23 October 2019.

https://github.com/toolforge/video2commons/blob/master/video2commons/backend/subtitles/__init__.py is probably the code for handling subtitles.

take note the last commit was on Oct 11, 2021, done by probably https://wikimediafoundation.org/profile/andrew-bogott/ .

royjor2016 commented 1 year ago

in july 2019 there were commits updating the part for subtitle import.

https://commons.wikimedia.org/wiki/Commons_talk:Video2commons/Archive_1#Not_importing_subtitles_from_youtube

https://github.com/toolforge/video2commons/commit/616a7c5c18b52bd040876b56d0954050ff939f2c

prototyperspective commented 1 week ago

This is a critical issue for this tool and I think a major issue for Wikimedia overall. Video2commons is the tool used for most video uploads since quite a while and it shows a selection for subtitles making people believe it uploads the subtitles when it often, maybe usually, doesn't do so.

This really needs to be fixed. @zhuyifei1999 @royjor2016 have you looked into it? Here are two examples where subtitles have not been imported: 1 2 The consequences also include that videos that have subtitles at the source do not have them when embedded in a Wikipedia article of that language or on the Commons frontpage. The second example may be due to issues with VTT subtitles in WMC, see https://phabricator.wikimedia.org/T368298. V2C could convert subtitles of other formats to srt – for example see this but I think it's not limited to video with vtt subtitles. It's strange to see so little activity on this issue.