openzim / mwoffliner

Mediawiki scraper: all your wiki articles in one highly compressed ZIM file
https://www.npmjs.com/package/mwoffliner
GNU General Public License v3.0
269 stars 71 forks source link

Question: Video Resolution #1442

Closed tim-moody closed 3 years ago

tim-moody commented 3 years ago

Where the source has multiple video resolutions is it possible to specify which should be included in the zim? Does mwoffliner automatically select one? In this example the zim has the lowest resolution.

ORIGINAL:

<video id="mwe_player_1"
    poster="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3d/Gout.webm/300px--Gout.webm.jpg" preload="none"
    width="300" height="169" data-durationhint="371.128" data-startoffset="0" data-mwtitle="Gout.webm"
    data-mwprovider="wikimediacommons" disabled="disabled">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.180p.vp9.webm"
        type="video/webm; codecs=&quot;vp9, opus&quot;" data-title="Low bandwidth VP9 (180P)" data-shorttitle="VP9 180P"
        data-transcodekey="180p.vp9.webm" data-width="320" data-height="180" data-bandwidth="117096"
        data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.240p.vp9.webm"
        type="video/webm; codecs=&quot;vp9, opus&quot;" data-title="Small VP9 (240P)" data-shorttitle="VP9 240P"
        data-transcodekey="240p.vp9.webm" data-width="426" data-height="240" data-bandwidth="123248"
        data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.360p.vp9.webm"
        type="video/webm; codecs=&quot;vp9, opus&quot;" data-title="VP9 (360P)" data-shorttitle="VP9 360P"
        data-transcodekey="360p.vp9.webm" data-width="640" data-height="360" data-bandwidth="136320"
        data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.480p.vp9.webm"
        type="video/webm; codecs=&quot;vp9, opus&quot;" data-title="SD VP9 (480P)" data-shorttitle="VP9 480P"
        data-transcodekey="480p.vp9.webm" data-width="854" data-height="480" data-bandwidth="152992"
        data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.720p.vp9.webm"
        type="video/webm; codecs=&quot;vp9, opus&quot;" data-title="HD VP9 (720P)" data-shorttitle="VP9 720P"
        data-transcodekey="720p.vp9.webm" data-width="1280" data-height="720" data-bandwidth="178600"
        data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.240p.webm"
        type="video/webm; codecs=&quot;vp8, vorbis&quot;" data-title="Small WebM (240P)" data-shorttitle="WebM 240P"
        data-transcodekey="240p.webm" data-width="426" data-height="240" data-bandwidth="193496" data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.1080p.vp9.webm"
        type="video/webm; codecs=&quot;vp9, opus&quot;" data-title="Full HD VP9 (1080P)" data-shorttitle="VP9 1080P"
        data-transcodekey="1080p.vp9.webm" data-width="1920" data-height="1080" data-bandwidth="224072"
        data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.360p.webm"
        type="video/webm; codecs=&quot;vp8, vorbis&quot;" data-title="WebM (360P)" data-shorttitle="WebM 360P"
        data-transcodekey="360p.webm" data-width="640" data-height="360" data-bandwidth="313656" data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.480p.webm"
        type="video/webm; codecs=&quot;vp8, vorbis&quot;" data-title="SD WebM (480P)" data-shorttitle="WebM 480P"
        data-transcodekey="480p.webm" data-width="854" data-height="480" data-bandwidth="452448" data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/3/3d/Gout.webm"
        type="video/webm; codecs=&quot;vp9, vorbis&quot;" data-title="Original WebM file, 1,920 × 1,080 (735 kbps)"
        data-shorttitle="WebM source" data-width="1920" data-height="1080" data-bandwidth="735492" data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.720p.webm"
        type="video/webm; codecs=&quot;vp8, vorbis&quot;" data-title="HD WebM (720P)" data-shorttitle="WebM 720P"
        data-transcodekey="720p.webm" data-width="1280" data-height="720" data-bandwidth="789464" data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.1080p.webm"
        type="video/webm; codecs=&quot;vp8, vorbis&quot;" data-title="Full HD WebM (1080P)" data-shorttitle="WebM 1080P"
        data-transcodekey="1080p.webm" data-width="1920" data-height="1080" data-bandwidth="1456560"
        data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.120p.vp9.webm"
        type="video/webm; codecs=&quot;vp9, opus&quot;" data-title="Lowest bandwidth VP9 (120P)"
        data-shorttitle="VP9 120P" data-transcodekey="120p.vp9.webm" data-width="214" data-height="120"
        data-bandwidth="111376" data-framerate="30">
    <source src="https://upload.wikimedia.org/wikipedia/commons/transcoded/3/3d/Gout.webm/Gout.webm.160p.webm"
        type="video/webm; codecs=&quot;vp8, vorbis&quot;" data-title="Low bandwidth WebM (160P)"
        data-shorttitle="WebM 160P" data-transcodekey="160p.webm" data-width="284" data-height="160"
        data-bandwidth="147912" data-framerate="30">
    <track
        src="https://commons.wikimedia.org/w/api.php?action=timedtext&amp;title=File%3AGout.webm&amp;lang=ar&amp;trackformat=srt&amp;origin=%2A"
        kind="subtitles" type="text/x-srt" srclang="ar" label="??????? (ar) subtitles" data-dir="rtl">
    <track
        src="https://commons.wikimedia.org/w/api.php?action=timedtext&amp;title=File%3AGout.webm&amp;lang=ar&amp;trackformat=vtt&amp;origin=%2A"
        kind="subtitles" type="text/vtt" srclang="ar" label="??????? (ar) subtitles" data-dir="rtl">
</video>

Resultant ZIM

<video poster="../I/Gout.webm.jpg.webp" controls="40" preload="none" height="169" width="300">
    <source src="../I/Gout.webm.120p.vp9.webm" type="video/webm; codecs=&quot;vp9, opus&quot;" data-width="214"
        data-height="120" data-title="Lowest bandwidth VP9 (120P)" data-shorttitle="VP9 120P">
    <track kind="subtitles" type="text/x-srt" src="../-/File:Gout.webm-ar.vtt" srclang="ar"
        label="العربية (ar) subtitles" data-mwtitle="" data-dir="rtl">
    <track kind="subtitles" type="text/vtt" src="../-/File:Gout.webm-ar.vtt" srclang="ar" label="العربية (ar) subtitles"
        data-mwtitle="" data-dir="rtl">
</video>
kelson42 commented 3 years ago

@tim-moody Like for pictures, MWoffliner picks the one which is displayed. @MananJethwani You confirm?

MananJethwani commented 3 years ago

@kelson42 we sort and remove all but one source, we only keep the one with the smallest width and height and use its resolution basically in that way we only keep the one with the lowest resolution which is the right thing to do.

WikiDocJames commented 3 years ago

Would be great to be able to select either a max or a min resolution. The problem we are experiencing is that the resolution of the Osmosis videos are too small to make out the text and thus are barely useful. Other videos have good resolution.

kelson42 commented 3 years ago

Where is the problem exactly? article name? A link at http://library.kiwix.org/wikipedia_en_medicine/ would be helpful. At this stage would be better to open a new ticket describing the problem completly from a user POV.

WikiDocJames commented 3 years ago

This is the link for the ZIM http://iiab.me/downloads/wp_en_mdwiki_2021-06.zim

If you look at the gout article

Here is the first video on that page which is good quality https://commons.wikimedia.org/wiki/File:En.Wikipedia-VideoWiki-Gout.webm

Here is the second video on that page which is insufficient quality https://commons.wikimedia.org/wiki/File:Gout.webm

Not sure how to determine the exact quality of each.

Here you can see the article this page was built from https://mdwiki.org/wiki/Gout

WikiDocJames commented 3 years ago

Okay have filled out a new ticket per your request https://github.com/openzim/mwoffliner/issues/1489