peterjc / mediawiki_to_git_md

Convert a MediaWiki export XML file into MarkDown as a series of git commits
MIT License
54 stars 17 forks source link

Image scaling during MediaWiki to Markdown conversion #10

Open peterjc opened 9 years ago

peterjc commented 9 years ago

Reported by @vincentdavis on issue #1

Some images are scaled, e.g. in http://biopython.org/wiki/Phylo or its MediaWiki equivalent https://github.com/peterjc/peterjc.github.io/blob/master/wiki/Phylo.mediawiki

[[File:phylo-draw-apaf1.png|256px|thumb|right|Rooted phylogram, via Phylo.draw]]

which becomes after conversion to GFM markdown with pandoc https://github.com/peterjc/peterjc.github.io/blob/master/wiki/Phylo.md

![Rooted phylogram, via Phylo.draw](phylo-draw-apaf1.png "fig:Rooted phylogram, via Phylo.draw")

On the actual wiki the image URL is: http://biopython.org/w/images/thumb/0/04/Phylo-draw-apaf1.png/256px-Phylo-draw-apaf1.png

If you what to see the full size image http://biopython.org/w/images/0/04/Phylo-draw-apaf1.png

@vincentdavis found a document that suggests code like this can be used for scaling images.

[[ http://url.to/image.png | height = 100px ]]
peterjc commented 8 years ago

As of #1 and #11, we now have the "original" unscaled images within the output.

This bug is now simply how to convert the MediaWiki image markup into (GitHub Flavoured) Markdown while preserving the image scaling information.

This is likely a pandoc configuration question...

peterjc commented 8 years ago

Wow, according to http://stackoverflow.com/questions/24383700/resize-image-in-the-wiki-of-github-using-markdown GFM no longer supports the height or width syntax mentioned above.

peterjc commented 8 years ago

Looks like we'll have to switch to minimal HTML snippets for images, see for example http://stackoverflow.com/questions/22485796/markdown-smaller-images-sizes-not-supported-by-github

<img src="url" alt="alt text" width="whatever" height="whatever">

This might work better with <div> wrapper images as then whole unit is HTML...

peterjc commented 8 years ago

According to recent commit https://github.com/jgm/pandoc/commit/244cd5644b44f43722530379138bd7bb9cbace9b pandoc 1.16 will add basic support for image sizes.

This appears to understand the height/width attributes when parsing mediawiki input: https://github.com/jgm/pandoc/commit/244cd5644b44f43722530379138bd7bb9cbace9b#diff-f003790849ba78911adb2e3836757776L577

It remains to be seen it would be rendered when we request GFM output... https://github.com/jgm/pandoc/issues/2554

peterjc commented 8 months ago

This ought to work with the pandoc fix (see issue linked to above), just need to find/make a test case to confirm this!