ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.31k stars 9.95k forks source link

Wikimedia Commons video download #30777

Open JimKillock opened 2 years ago

JimKillock commented 2 years ago

Checklist

Description

Include the ability to import videos and associated data from Wikimedia Commons video URLs, eg from

https://commons.wikimedia.org/wiki/File:Die_Temperaturkurve_der_Erde_(ZDF,_Terra_X)_720p_HD_50FPS.webm

  1. Associated subtitle files, eg https://commons.wikimedia.org/wiki/TimedText:Die_Temperaturkurve_der_Erde_(ZDF,_Terra_X)_720p_HD_50FPS.webm.de.srt These are in consistent locations, indicated by "hidden categories" on the video page.
  2. Title and description
  3. Licence information

This would greatly aid import of these videos into Peertube and other tools.

dirkf commented 2 years ago

Possibly OP needs a scripting solution?

A shell script using grep or awk could help. If someone wants to propose a yt-dl extractor to help, I'm not completely opposed, but it wouldn't be a priority.

JimKillock commented 2 years ago

Hi there, so my specific use case is that

  1. I am an ordinary Peertube user, wanting to import a large number of Wikimedia videos

  2. Peertube AIUI uses youtube-dl to automate video imports, including associated srt files and other data

  3. Currently, therefore a YT import on Peertube can successfully bring in subtitles and descriptions as well as video, making migration of files very easy

  4. However, Wikimedia » Peertube is very hard, even though in priciple the files are more open and files more standardised

If I am mistaken about the subtitles files depending on youtube-dl my apologies, but I hope this explains more clearly why I am raising these here, and why a scripting solution would be less elegant.

JimKillock commented 2 years ago

Well, that is a little surprising. The CC licence for a start is explicitly there to allow reuse, so long as the licence and attribution is preserved. If it is possible for youtube-dl to include the relevant licence information and the appropriate URL for attribution, that would be even better, but I assumed this would be unlikely (I doubt you are trying to pick that up elsewhere).

In my case I am a contributor to Wiki Commons and have spent a lot of time manually extracting CC-by YT files to import to Wiki Commons; this isn't a simple process at all.

I am now considering republishing that same CC-By content on a Peertube instance. To me this is a valid and useful activity, as it makes the content available outside of Youtube. of course I will be encouraging conributors to do this themselves.

In some cases, because for instance there is better subtitling files, this would be better done WM » Peertube than YT » Peertube.

JimKillock commented 2 years ago

I can do, sure, for example, I did a test here. I have a larger test running, but don’t want to advertise the URL until I’ve discussed releasing the content with the original authors.

Further context, my day job is working for the Open Rights Group in the UK. It would threaten not just my personal reputation but my organisation’s to be violating copyright in the way you suggest may take place.

Your point about respecting the licence is valid and important; if it is possible for youtube-dl to include the licence information in the metadata, and the source so it would be ported to the Peertube republication, and would have to be actively removed by a user wanting to falsely claim credit etc, that would be even better.

Note that this feature doesn’t currently exist with YT imports to Peertube (a much more common occurance); any cc-by licence information has to be entered by hand. Perhaps I should open a separate ticket to see if this information can be included.

JimKillock commented 2 years ago

Thanks, that is a shame. TBH I think not adding this feature would leave users in a position where they are more likely to violate copyright than if it can be made easy to correctly attribute and licence by default (it’s trivial to add WM videos to Peertube; however licence information is not currently set).

(FWIW I’ve made the same suggestions to Peertube.)

JimKillock commented 2 years ago

See below:

Hello,

We use yt-dlp or youtube-dl to grab this information, so the issue on their side should be enough: https://github.com/ytdl-org/youtube-dl/issues/30777 Regarding subtitles, for now PeerTube only supports vtt but you can create another issue so we support srt imports. But youtube-dl extractor needs to be fixed first:

"subtitles": {
"de": [
{
"url": "https://commons.wikimedia.org/w/api.php?action=timedtext&title=File%3ADie_Temperaturkurve_der_Erde_%28ZDF%2C_Terra_X%29_720p_HD_50FPS.webm&lang=de&trackformat=srt",
"ext": "php"
}
],
"en-GB": [
{
"url": "https://commons.wikimedia.org/w/api.php?action=timedtext&title=File%3ADie_Temperaturkurve_der_Erde_%28ZDF%2C_Terra_X%29_720p_HD_50FPS.webm&lang=en-gb&trackformat=srt",
"ext": "php"
}
],
"nl": [
{
"url": "https://commons.wikimedia.org/w/api.php?action=timedtext&title=File%3ADie_Temperaturkurve_der_Erde_%28ZDF%2C_Terra_X%29_720p_HD_50FPS.webm&lang=nl&trackformat=srt",
"ext": "php"
}
]
}

ext should be srt

Originally posted by @Chocobozzz in https://github.com/Chocobozzz/PeerTube/issues/4879#issuecomment-1079851716

JimKillock commented 2 years ago

It is very much a personal thing. Nevertheless if it's not hard to fix (it doesn't look like it is) I may be willing to pay for a fix.

JimKillock commented 2 years ago

Ok, have emailed from an openlatin.org email.

JimKillock commented 2 years ago

Note to say that @89z isn’t able to work on building a yt-dl extractor for this but if anyone else is (including if paid) I would be very interested to hear from you. I don’t know this software from the insude, but given that it already more or less works for this purpose, I would expect this is a fairly simple job. Please correct me if you know otherwise, or can estimate how much work this would be.

JimKillock commented 2 years ago

@dirkf Is there a guide to writing an extractor that I can point someone to, in order to have a go at this?

dirkf commented 2 years ago

29310

29724