yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
82.05k stars 6.4k forks source link

Broken Site: mytaratata.com #2690

Open essective opened 2 years ago

essective commented 2 years ago

Checklist

Region

Romania

Description

Please update YT-DLP to download clips from mytratata.com.

Verbose log

yt-dlp -v https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008
[debug] Command-line config: ['-v', 'https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008']
[debug] User config "/home/c/.config/yt-dlp/config": ['--format', '22/17/18/best', '--output', '/home/myname/%(upload_date)s_%(uploader)s_%(title)s.%(ext)s', '--restrict-filenames']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, err utf-8, pref UTF-8
[debug] yt-dlp version 2022.02.04 [c1653e9ef] (zip)
[debug] Python version 3.10.2 (CPython 64bit) - Linux-5.15.14-1-lts-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.4.1 (setts), ffprobe 4.4.1, rtmpdump 2.4
[debug] Optional libraries: sqlite
[debug] Proxy map: {}
[debug] [generic] Extracting URL: https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008
[generic] kevin-michael-yael-naim-lean-on-me-2008: Requesting header
WARNING: [generic] Falling back on generic information extractor.
[generic] kevin-michael-yael-naim-lean-on-me-2008: Downloading webpage
[generic] kevin-michael-yael-naim-lean-on-me-2008: Extracting information
[debug] Looking for video embeds
ERROR: Unsupported URL: https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008
Traceback (most recent call last):
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1381, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/yt-dlp/yt_dlp/YoutubeDL.py", line 1451, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/yt-dlp/yt_dlp/extractor/common.py", line 612, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/yt-dlp/yt_dlp/extractor/generic.py", line 3986, in _real_extract
    raise UnsupportedError(url)
yt_dlp.utils.UnsupportedError: Unsupported URL: https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008
HobbyistDev commented 2 years ago

Still not work in version 2022.06.22.1

(python_proj2) ~\python_proj2\yt-dlp>python -m yt_dlp -v -s "https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008" --no-check-certificate
[debug] Command-line config: ['-v', '-s', 'https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008', '--no-check-certificate']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.06.22.1 [a86e01e74] (source)
[debug] Lazy loading extractors is disabled
[debug] Plugins: ['SamplePluginIE', 'SamplePluginPP']
[debug] Git HEAD: e87b94823
[debug] Python version 3.10.4 (CPython 64bit) - Windows-10-10.0.19044-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg 5.0.1-full_build-www.gyan.dev (setts), ffprobe 5.0.1-full_build-www.gyan.dev
[debug] Optional libraries: Cryptodome-3.14.1, brotli-1.0.9, certifi-2022.05.18.1, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] [generic] Extracting URL: https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008
[generic] kevin-michael-yael-naim-lean-on-me-2008: Requesting header
WARNING: [generic] Falling back on generic information extractor.
[generic] kevin-michael-yael-naim-lean-on-me-2008: Downloading webpage
[generic] kevin-michael-yael-naim-lean-on-me-2008: Extracting information
[debug] Looking for video embeds
ERROR: Unsupported URL: https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008
Traceback (most recent call last):
  File "~\python_proj2\yt-dlp\yt_dlp\YoutubeDL.py", line 1419, in wrapper
    return func(self, *args, **kwargs)
  File "~\python_proj2\yt-dlp\yt_dlp\YoutubeDL.py", line 1489, in __extract_info
    ie_result = ie.extract(url)
  File "~\python_proj2\yt-dlp\yt_dlp\extractor\common.py", line 639, in extract
    ie_result = self._real_extract(url)
  File "~\python_proj2\yt-dlp\yt_dlp\extractor\generic.py", line 4132, in _real_extract
    raise UnsupportedError(url)
yt_dlp.utils.UnsupportedError: Unsupported URL: https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008

However, the video is in video tag that has src attribute and we can get media url from there. I think this should be handled by GenericIE imo (the video still need referrer header though)

Here's the html part that i mean:

<video class="jw-video jw-reset" tabindex="-1" disableremoteplayback="" webkit-playsinline="" playsinline="" preload="metadata" src="https://taratata.net/253/239-720x576.mp4"></video>
pukkandan commented 2 years ago

I don't see any video tag in the webpage!

HobbyistDev commented 2 years ago

After checking in curl, i found alternative url that should return same video, here's the tag:

<div class="jwplayer" id="jwplayer-14944" data-heading="Vous aimerez aussi" data-image="https://mytaratata.com/media/cache/player/content/image/00/00/00/24/2480.jpeg" ... data-related="https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008/related.json" data-source="https://taratata.net/253/239-720x576.mp4">

As we can see, there is data-source that have direct link for media

pukkandan commented 2 years ago

It's odd, i think the tag appear because 'some magic' of js, it's appeared at my browser.

Obv, you have to check in view-source:https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008, not in the rendered DOM... If yt-dlp had access to the rendered version, most extractors would be unnecessary!

HobbyistDev commented 2 years ago

sorry, my bad

pukkandan commented 2 years ago
<div class="jwplayer" id="jwplayer-14944" data-heading="Vous aimerez aussi" data-image="https://mytaratata.com/media/cache/player/content/image/00/00/00/24/2480.jpeg" ... data-related="https://mytaratata.com/taratata/253/kevin-michael-yael-naim-lean-on-me-2008/related.json" data-source="https://taratata.net/253/239-720x576.mp4">

As we can see, there is data-source that have direct link for media

Makes sense to add support for this in GenericIE alongside the current js player detection code. Want to make a PR?

HobbyistDev commented 2 years ago

I think i can't make PR for this. I still working in another PR and i don't know about regex much