openzim / youtube

Create a ZIM file from a Youtube channel/username/playlist
GNU General Public License v3.0
43 stars 26 forks source link

Not a valid iso language name/code #121

Closed kelson42 closed 3 years ago

kelson42 commented 3 years ago

https://farm.openzim.org/pipeline/5f5f3154cf078db7bac5284e/debug

satyamtg commented 3 years ago

Seems to be the same error as #112 . It has already been fixed in the 2.1.8.dev0 version. I see that the container that the recipe uses is 2.1.7 . I'm running it on dev to check once if the fix works for this too.

kelson42 commented 3 years ago

We should probably release a new version then. This fix is already 2 monthes old. Any chance to get that today so we can relaunch the recipe?

satyamtg commented 3 years ago

We should probably release a new version then. This fix is already 2 monthes old. Any chance to get that today so we can relaunch the recipe?

I launched on the dev container. And it passed - https://farm.openzim.org/pipeline/5f61b5bb65c499a28faedd75 I checked the logs and they seem good. Haven't tested the ZIM though as I think its still in the quarantine.

kelson42 commented 3 years ago

@satyamtg I have to reopen the ticket as the bug seems not (fully) fixed, see https://farm.openzim.org/pipeline/d683223bedd943b94fb67ff5/debug with khan-videos_bn_playlists recipe.

satyamtg commented 3 years ago

@satyamtg I have to reopen the ticket as the bug seems not (fully) fixed, see https://farm.openzim.org/pipeline/d683223bedd943b94fb67ff5/debug with khan-videos_bn_playlists recipe.

I think we need to extend the language mapping to handle as this seems the same error as earlier. Running a test to find which language code breaks it. Will also add a better failure message (one that displays the language code that failed)

rgaudin commented 3 years ago

Might also be interesting to check that earlier in scraper as this happens and fails post-download

kelson42 commented 3 years ago

Same for teded_en_all, see https://farm.openzim.org/pipeline/4548223bedd943b91f808ff5/debug

kelson42 commented 3 years ago

Same problem with :

rgaudin commented 3 years ago

@kelson42, youtube sometimes uses non-standard codes for subtitles. That's something we have to react to when we encounter it but it's not a regression.

I've added mapping for the khan-videos-bn culprit (zh-Hans-CN) and I've tried to find the list of all possible codes from the YT subtitle uploads UI. I found a couple of codes that would have failed and mapped those as well.

Unfortunately, that UI had no reference of that zh-Hans-CN code for instance so it means the UI doesn't expose all the codes they are using, meaning we might encounter a similar issue in the future.

I've added a debug print of such code so that in the event of a similar issue, we'll save a lot of time and be able to fix it immediately.

So, next time, open a new ticket with the lang code.

rgaudin commented 3 years ago

And you mentioned khan-videos_es_playlists but this one has noting to do with this. it's a recipe issue.

rgaudin commented 3 years ago

As for teded, it was zh-Hant-TW.

kelson42 commented 3 years ago

Problem is still there, see https://farm.openzim.org/pipeline/825a5abd771312617d4cc506/debug:

[info] Writing video subtitles to: /output/tmpm2lbegcu/videos/zm3TXDZrifU/video.bg.vtt
[info] Writing video subtitles to: /output/tmpm2lbegcu/videos/zm3TXDZrifU/video.en.vtt
[info] Writing video subtitles to: /output/tmpm2lbegcu/videos/zm3TXDZrifU/video.ka.vtt
[info] Writing video subtitles to: /output/tmpm2lbegcu/videos/zm3TXDZrifU/video.ko.vtt
[info] Writing video subtitles to: /output/tmpm2lbegcu/videos/zm3TXDZrifU/video.sw.vtt
[info] Writing video subtitles to: /output/tmpm2lbegcu/videos/zm3TXDZrifU/video.ta.vtt
[info] Writing video subtitles to: /output/tmpm2lbegcu/videos/zm3TXDZrifU/video.vi.vtt
[youtube2zim::2021-03-28 07:43:02,086] INFO:retrieve channel-info for all videos (author details)
[youtube2zim::2021-03-28 07:43:02,086] DEBUG:query youtube-api for Video details of 31 videos
[youtube2zim::2021-03-28 07:43:02,186] INFO:download all author's profile pictures
[youtube2zim::2021-03-28 07:43:02,186] DEBUG:query youtube-api for Channel #UC4a-Gbdw7vOaccHmFo40b9g
[youtube2zim::2021-03-28 07:43:02,382] INFO:update general metadata
[youtube2zim::2021-03-28 07:43:02,504] INFO:creating HTML files
[youtube2zim::2021-03-28 07:43:02,880] ERROR:Failed to get language details for zh-Hant-HK
[youtube2zim::2021-03-28 07:43:02,881] ERROR:FAILED. An error occurred: Not a valid iso language name/code
[youtube2zim::2021-03-28 07:43:02,881] ERROR:Not a valid iso language name/code
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.13-py3.8.egg/youtube2zim/entrypoint.py", line 202, in main
    scraper.run()
  File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.13-py3.8.egg/youtube2zim/scraper.py", line 313, in run
    self.make_html_files(succeeded)
  File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.13-py3.8.egg/youtube2zim/scraper.py", line 861, in make_html_files
    subtitles = get_subtitles(video_id)
  File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.13-py3.8.egg/youtube2zim/scraper.py", line 839, in get_subtitles
    return sorted(map(to_jinja_subtitle, languages), key=lambda x: x["name"])
  File "/usr/local/lib/python3.8/site-packages/youtube2zim-2.1.13-py3.8.egg/youtube2zim/scraper.py", line 827, in to_jinja_subtitle
    subtitle = get_language_details(YOUTUBE_LANG_MAP.get(lang, lang))
  File "/usr/local/lib/python3.8/site-packages/zimscraperlib/i18n.py", line 161, in get_language_details
    raise exc
  File "/usr/local/lib/python3.8/site-packages/zimscraperlib/i18n.py", line 157, in get_language_details
    lang_data, macro_data = get_iso_lang_data(adjusted_query)
  File "/usr/local/lib/python3.8/site-packages/zimscraperlib/i18n.py", line 77, in get_iso_lang_data
    raise NotFound("Not a valid iso language name/code")
zimscraperlib.i18n.NotFound: Not a valid iso language name/code
kelson42 commented 3 years ago

@rgaudin Why the error message seems to still do not report the iso code which has a problem?

rgaudin commented 3 years ago

I suppose this recipe is not using the latest version. what version is it ?

kelson42 commented 3 years ago

2.1.13, it is using the latest version, see https://farm.openzim.org/recipes/khan-videos_en_playlists

rgaudin commented 3 years ago

The language code is in the log you supplied:

[youtube2zim::2021-03-28 07:43:02,880] ERROR:Failed to get language details for zh-Hant-HK
rgaudin commented 3 years ago

Fixed. Please retry with :dev in a moment

kelson42 commented 3 years ago

Still a problem: ERROR:Failed to get language details for zh-Hans-SG https://farm.openzim.org/pipeline/aa245a517a9c06751eaad606/debug (khan-videos_en_playlists)

rgaudin commented 3 years ago

Fixed. Please open a new ticket next time and use a different wording: those are new languages that we need to map and we don't have a list of those. Alternatively, we could also not fail and just display the language code in the subtitles select. Open a ticket for that if you think that's appropriate.