yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
81.86k stars 6.38k forks source link

Can't download tiktok video #3551

Closed Neurotoxin001 closed 2 years ago

Neurotoxin001 commented 2 years ago

Checklist

Region

Russia

Description

Video link is available, but yt-dlp can't download it: https://www.tiktok.com/@denidil6/video/7065799023130643713

Verbose log

D:\PROGRAMS\TikTok Download>!dl https://www.tiktok.com/@denidil6/video/7065799023130643713 -vU
[debug] Command-line config: ['https://www.tiktok.com/@denidil6/video/7065799023130643713', '-vU']
[debug] Encodings: locale cp1251, fs utf-8, out utf-8, err utf-8, pref cp1251
[debug] yt-dlp version 2022.04.08 [7884ade] (win_exe)
[debug] Python version 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg n5.0.1-3-gb655beb025-20220419 (setts), ffprobe n5.0.1-3-gb655beb025-20220419
[debug] Optional libraries: brotli, certifi, Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2022.04.08, Current version: 2022.04.08
yt-dlp is up to date (2022.04.08)
[debug] [TikTok] Extracting URL: https://www.tiktok.com/@denidil6/video/7065799023130643713
[TikTok] 7065799023130643713: Downloading video details
WARNING: [TikTok] 7065799023130643713: Video not available; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U; Retrying with feed workaround
[TikTok] 7065799023130643713: Downloading video feed
WARNING: [TikTok] 7065799023130643713: Unable to find video in feed; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U; Retrying with webpage
[TikTok] 7065799023130643713: Downloading webpage
ERROR: [TikTok] 7065799023130643713: Unable to download webpage: The read operation timed out (caused by timeout('The read operation timed out')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "yt_dlp\extractor\common.py", line 641, in extract
  File "yt_dlp\extractor\tiktok.py", line 520, in _real_extract
  File "yt_dlp\extractor\common.py", line 932, in _download_webpage
  File "yt_dlp\extractor\common.py", line 800, in _download_webpage_handle
  File "yt_dlp\extractor\common.py", line 785, in _request_webpage

  File "yt_dlp\extractor\common.py", line 767, in _request_webpage
  File "yt_dlp\YoutubeDL.py", line 3601, in urlopen
  File "urllib\request.py", line 525, in open
  File "urllib\request.py", line 542, in _open
  File "urllib\request.py", line 502, in _call_chain
  File "yt_dlp\utils.py", line 1543, in https_open
  File "urllib\request.py", line 1358, in do_open
  File "http\client.py", line 1344, in getresponse
  File "http\client.py", line 307, in begin
  File "http\client.py", line 268, in _read_status
  File "socket.py", line 669, in readinto
  File "ssl.py", line 1241, in recv_into
  File "ssl.py", line 1099, in read
socket.timeout: The read operation timed out
dirkf commented 2 years ago

Works for me with yt-dl PR https://github.com/ytdl-org/youtube-dl/pull/30479.

Try --add-header "User-Agent:Mozilla/5.0" (yt-dl: --user-agent "Mozilla/5.0").

Neurotoxin001 commented 2 years ago

Still can't download with --add-header "User-Agent:Mozilla/5.0"

D:\PROGRAMS\TikTok Download>!dl https://www.tiktok.com/@denidil6/video/7065799023130643713 --add-header "User-Agent:Mozilla/5.0" -vU
[debug] Command-line config: ['https://www.tiktok.com/@denidil6/video/7065799023130643713', '--add-header', 'User-Agent:Mozilla/5.0', '-vU']
[debug] Encodings: locale cp1251, fs utf-8, out utf-8, err utf-8, pref cp1251
[debug] yt-dlp version 2022.04.08 [7884ade] (win_exe)
[debug] Python version 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg n5.0.1-3-gb655beb025-20220419 (setts), ffprobe n5.0.1-3-gb655beb025-20220419
[debug] Optional libraries: brotli, certifi, Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2022.04.08, Current version: 2022.04.08
yt-dlp is up to date (2022.04.08)
[debug] [TikTok] Extracting URL: https://www.tiktok.com/@denidil6/video/7065799023130643713
[TikTok] 7065799023130643713: Downloading video details
WARNING: [TikTok] 7065799023130643713: Video not available; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U; Retrying with feed workaround
[TikTok] 7065799023130643713: Downloading video feed
WARNING: [TikTok] 7065799023130643713: Unable to find video in feed; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U; Retrying with webpage
[TikTok] 7065799023130643713: Downloading webpage
[TikTok] 7065799023130643713: Downloading video webpage
ERROR: [TikTok] 7065799023130643713: Unable to extract sigi data; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "yt_dlp\extractor\common.py", line 641, in extract
  File "yt_dlp\extractor\tiktok.py", line 528, in _real_extract
  File "yt_dlp\extractor\common.py", line 1229, in _search_regex
dirkf commented 2 years ago

The yt-dlp extractor probably hasn't been updated for the new structure where the hydration JSON is just sent in a <script> element with a certain id instead of being assigned to a var.

The details are in the PR linked above.

afterdelight commented 2 years ago

merge pls

pukkandan commented 2 years ago

@afterdelight Feel free to make a PR

afterdelight commented 2 years ago

sorry i was referencing this PR https://github.com/ytdl-org/youtube-dl/pull/30479

pukkandan commented 2 years ago

You can see the author of the PR is the maintainer of youtube-dl. So it will be merged in ytdl when he believes the code is good enough to be merged, and I will pull it to yt-dlp after that. Or, if you want to bypass ytdl altogether, you should make a PR directly to yt-dlp and address any reviews.

merge pls

Messages like this and https://github.com/ytdl-org/youtube-dl/pull/30479#issuecomment-1115007307 are only counter-productive. If you have nothing constructive to add to an issue, the best thing you can do is to patiently wait

afterdelight commented 2 years ago

ok, but i cant wait to download my sister's videos on tik tok

dirkf commented 2 years ago

In this case the yt-dl PR had been lingering because of unwanted 403s and timeouts when pulling the video metadata from the page, now fixed by forcing all unspecified UAs to Mozilla/5.0. The yt-dlp extractor uses API URLs first and then falls back to extraction from the page.

In the yt-dlp version the problem in this issue should be fixed by replacing the code that extracts the 'sigi' hydration JSON by a call to this method:

    def _get_SIGI_STATE(self, video_id, html):
        state = self._parse_json(
            get_element_by_id('SIGI_STATE', html)
            or self._search_regex(
                r'''(?s)<script\s[^>]*?\bid\s*=\s*(?P<q>"|'|\b)sigi-persisted-data(?P=q)[^>]*>[^=]*=\s*(?P<json>{.+?})\s*(?:;[^<]+)?</script''',
                html, 'sigi data', default='{}', group='json'), video_id)
        return state if isinstance(state, dict) else {}

One possible issue is that the yt-dlp get_element_by_id() uses HTML parsing whereas yt-dl uses regex, so different failure modes are possible. If the page breaks the parser, possible work-arounds include sanitising the page before parsing, or using a regex instead.

When the sigi-persisted-data target above was added, tests showed that TT was sending both the previous page format and the 'sigi' format, perhaps depending on CDN or A-B testing. Probably the same is true of the SIGI_STATE target.

afterdelight commented 2 years ago

i thought the problem on youtube-dl side was solved already. this fix looks good. maybe pukkupandan can take a look at this fix

sulyi commented 2 years ago

Error occurs before html is received (after ssl handshake). Setting UA to Mozilla/5.0 is not enough, and many other UA is still rejected. It's not clear that this is intentional behaviour of the server or some limitation of python ssl module.

Setting a random cookie (compliant with [RFC6265] , Section 4.1.1) allowed to get reliably get html data for further processing.

Neurotoxin001 commented 2 years ago

Another example but with new error: https://www.tiktok.com/@artemka_ashotik_kristi/video/7037054266569149697

D:\PROGRAMS\TikTok Download>!dl https://www.tiktok.com/@artemka_ashotik_kristi/video/7037054266569149697 -vU
[debug] Command-line config: ['https://www.tiktok.com/@artemka_ashotik_kristi/video/7037054266569149697', '-vU']
[debug] Encodings: locale cp1251, fs utf-8, out utf-8, err utf-8, pref cp1251
[debug] yt-dlp version 2022.04.08 [7884ade] (win_exe)
[debug] Python version 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg n5.0.1-3-gb655beb025-20220419 (setts), ffprobe n5.0.1-3-gb655beb025-20220419
[debug] Optional libraries: brotli, certifi, Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2022.04.08, Current version: 2022.04.08
yt-dlp is up to date (2022.04.08)
[debug] [TikTok] Extracting URL: https://www.tiktok.com/@artemka_ashotik_kristi/video/7037054266569149697
[TikTok] 7037054266569149697: Downloading video details
[debug] Sort order given by extractor: quality, codec, size, br
[debug] Formats sorted by: hasvid, ie_pref, quality, vcodec, acodec, filesize, fs_approx, tbr, vbr, abr, lang, res, fps, hdr:12(7), asr, proto, vext, aext, hasaud, source, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] 7037054266569149697: Downloading 1 format(s): bytevc1_720p_1505067-2
[debug] Invoking downloader on "https://v77.tiktokcdn.com/b78e79b9f378144cf3dd5176f5fada38/6276c32f/video/tos/alisg/tos-alisg-pve-0037c001/0b07f86c4e3f4f858e0415722d763656/?a=1180&br=2938&bt=1469&cd=0%7C0%7C0%7C3&ch=0&cr=3&cs=2&cv=1&dr=0&ds=3&er=&ft=ARfLEB8Uq1bmo0PzE3DfkVQ1PR_u_KJ&l=202205071305570102451442000A351C5F&lr=all&mime_type=video_mp4&net=0&pl=0&qs=14&rc=amt0ODY6ZnBkOTMzODczNEApaWY7aDY6OWU2NzUzNGU4OmdnZ29ecjQwM2hgLS1kMS1zc19fMjBiM2BiMGIxNTFjYi06Yw%3D%3D&vl=&vr="
[download] Unable to open file due to file access error. Retrying (attempt 1 of 3) ...
[download] Unable to open file due to file access error. Retrying (attempt 2 of 3) ...
[download] Unable to open file due to file access error. Retrying (attempt 3 of 3) ...
ERROR: unable to open for writing: [Errno 22] Invalid argument: 'ПАПЕ МОЖНО ВСЕ‼️❓❓А вы как считаете правильно ли это❓🔶Смотри до конца и напиши своё мнение‼️📣🔥Незабудь подписаться чтоб не пропустить новое видео уже завтра📺✍️#папе#можно#все#рек#телефон#уронилтелефон#айфон#реки#хаха#попал#разбилтелефон#смешнодослез#рекомендации#юмор#хочуврекомендации#топчик#😂😂 [7037054266569149697].mp4.part'
Traceback (most recent call last):
  File "yt_dlp\utils.py", line 690, in sanitize_open
yt_dlp.utils.LockingUnsupportedError: File locking is not supported on this platform

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "yt_dlp\downloader\http.py", line 263, in download
  File "yt_dlp\downloader\common.py", line 221, in inner
  File "yt_dlp\downloader\common.py", line 238, in sanitize_open
  File "yt_dlp\utils.py", line 693, in sanitize_open
OSError: [Errno 22] Invalid argument: 'ПАПЕ МОЖНО ВСЕ‼️❓❓А вы как считаете правильно ли это❓🔶Смотри до конца и напиши своё мнение‼️📣🔥Незабудь подписаться чтоб не пропустить новое видео уже завтра📺✍️#папе#можно#все#рек#телефон#уронилтелефон#айфон#реки#хаха#попал#разбилтелефон#смешнодослез#рекомендации#юмор#хочуврекомендации#топчик#😂😂 [7037054266569149697].mp4.part'
pukkandan commented 2 years ago

@Neurotoxin001 Use a shorter filename

Neurotoxin001 commented 2 years ago

I got a new one: https://www.tiktok.com/@carpentrez/video/7090248144331509038

D:\PROGRAMS\TikTok Download>!dl https://www.tiktok.com/@carpentrez/video/7090248144331509038 -vU --compat-options filename-sanitization
[debug] Command-line config: ['https://www.tiktok.com/@carpentrez/video/7090248144331509038', '-vU', '--compat-options', 'filename-sanitization']
[debug] Encodings: locale cp1251, fs utf-8, out utf-8, err utf-8, pref cp1251
[debug] yt-dlp version 2022.04.08 [7884ade] (win_exe)
[debug] Compatibility options: filename-sanitization
[debug] Python version 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg n4.4.2-1-g8e98dfc57f-20220507 (setts), ffprobe n4.4.2-1-g8e98dfc57f-20220507
[debug] Optional libraries: brotli, certifi, Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2022.04.08, Current version: 2022.04.08
yt-dlp is up to date (2022.04.08)
[debug] [TikTok] Extracting URL: https://www.tiktok.com/@carpentrez/video/7090248144331509038
[TikTok] 7090248144331509038: Downloading video details
[debug] Sort order given by extractor: quality, codec, size, br
[debug] Formats sorted by: hasvid, ie_pref, quality, vcodec, acodec, filesize, fs_approx, tbr, vbr, abr, lang, res, fps, hdr:12(7), asr, proto, vext, aext, hasaud, source, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] 7090248144331509038: Downloading 1 format(s): bytevc1_720p_520441-2
[debug] Invoking downloader on "https://v16m.tiktokcdn.com/d107c9d6b5ac5afe92f266b5a8301b6d/627bdff2/video/tos/maliva/tos-maliva-ve-0068c799-us/01426dffad7f43e29117ba4efdb74449/?a=1180&br=1016&bt=508&cd=0%7C0%7C0%7C3&ch=0&cr=3&cs=2&cv=1&dr=0&ds=3&er=&ft=ARJXOB4VqJtmo0PGRlSfkVQjUHCF_KJ&l=202205111010030102450021461536BC37&lr=all&mime_type=video_mp4&net=0&pl=0&qs=14&rc=am5zbjM6ZmlzPDMzZzczNEApOjxoZDRmOmU3Nzs7PGk7aWdqYm5kcjRvcWZgLS1kMS9zc2AvM2IxMmIyNF4uMDIvMF46Yw%3D%3D&vl=&vr="
[download] Unable to open file due to file access error. Retrying (attempt 1 of 3) ...
[download] Unable to open file due to file access error. Retrying (attempt 2 of 3) ...
[download] Unable to open file due to file access error. Retrying (attempt 3 of 3) ...
ERROR: unable to open for writing: [Errno 22] Invalid argument: 'Reply to @toohigh_nolow Reposting because Tiktok removed it. There is absolutely no nudity or sexual content in this, sadly I had to enlarged the censor bar and decreased the size of Krabs pectoral muscles 🙃#fyp #foryoupage #foryourpage #GlowUp #bodybuilding #physique #art #spongebob #artist [7090248144331509038].mp4.part'
Traceback (most recent call last):
  File "yt_dlp\utils.py", line 690, in sanitize_open
yt_dlp.utils.LockingUnsupportedError: File locking is not supported on this platform

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "yt_dlp\downloader\http.py", line 263, in download
  File "yt_dlp\downloader\common.py", line 221, in inner
  File "yt_dlp\downloader\common.py", line 238, in sanitize_open
  File "yt_dlp\utils.py", line 693, in sanitize_open
OSError: [Errno 22] Invalid argument: 'Reply to @toohigh_nolow Reposting because Tiktok removed it. There is absolutely no nudity or sexual content in this, sadly I had to enlarged the censor bar and decreased the size of Krabs pectoral muscles 🙃#fyp #foryoupage #foryourpage #GlowUp #bodybuilding #physique #art #spongebob #artist [7090248144331509038].mp4.part'

Even with --compat-options filename-sanitization or --trim-filenames 20

Neurotoxin001 commented 2 years ago

Seems like my example video was removed from tiktok so I can't try it with new version of yt-dlp