suntong / html2md

HTML to Markdown converter
MIT License
226 stars 19 forks source link

html2md

All Contributors

MIT License GoDoc Go Report Card Build Status PoweredBy WireFrame

TOC

html2md - HTML to Markdown converter

The html2md makes use of https://github.com/JohannesKaufmann/html-to-markdown to convert HTML into Markdown, which is using an HTML Parser to avoid the use of regexp as much as possible, which can prevent some weird cases and allows it to be used for cases where the input is totally unknown.

html-to-markdown logo

Usage

$ html2md

HTML to Markdown
Version 1.5.0 built on 2024-02-10
Copyright (C) 2020-2024, Tong Sun

HTML to Markdown converter on command line

Usage:
  html2md [Options...]

Options:

  -h, --help                       display help information 
  -i, --in                        *The html/xml file to read from (or stdin) 
  -d, --domain                     Domain of the web page, needed for links when --in is not url 
  -s, --sel                        CSS/goquery selectors [=body]
  -x, --excl                       Excluding CSS/goquery selectors 
      --xc                         Excluding all children nodes 
  -v, --verbose                    Verbose mode (Multiple -v options increase the verbosity.) 

      --opt-heading-style          Option HeadingStyle 
      --opt-horizontal-rule        Option HorizontalRule 
      --opt-bullet-list-marker     Option BulletListMarker 
      --opt-code-block-style       Option CodeBlockStyle 
      --opt-fence                  Option Fence 
      --opt-em-delimiter           Option EmDelimiter 
      --opt-strong-delimiter       Option StrongDelimiter 
      --opt-link-style             Option LinkStyle 
      --opt-link-reference-style   Option LinkReferenceStyle 
      --opt-escape-mode            Option EscapeMode 

      --plugin-br-to-newline       Plugin BrToNewline 
  -A, --plugin-conf-attachment     Plugin ConfluenceAttachments 
  -C, --plugin-conf-code           Plugin ConfluenceCodeBlock 
  -F, --plugin-frontmatter         Plugin FrontMatter 
  -G, --plugin-gfm                 Plugin GitHubFlavored 
  -S, --plugin-strikethrough       Plugin Strikethrough 
  -T, --plugin-table               Plugin Table 
      --plugin-table-compat        Plugin TableCompat 
  -L, --plugin-task-list           Plugin TaskListItems 
  -V, --plugin-vimeo               Plugin VimeoEmbed 
  -Y, --plugin-youtube             Plugin YoutubeEmbed

Examples

Simplest form

$ html2md -i https://github.com/suntong/html2md | head -3
[Skip to content](#start-of-content)

[Homepage](https://github.com/)

Using goquery

The most useful feature is to use and pass a goquery selection to filter for the content you want.

$ html2md -i https://github.com/JohannesKaufmann/html-to-markdown -s "div.my-3"
[go](http://github.com/topics/go "Topic: go") [html](http://github.com/topics/html "Topic: html") [markdown](http://github.com/topics/markdown "Topic: markdown") [golang](http://github.com/topics/golang "Topic: golang") [converter](http://github.com/topics/converter "Topic: converter") [html-to-markdown](http://github.com/topics/html-to-markdown "Topic: html-to-markdown") [goquery](http://github.com/topics/goquery "Topic: goquery")

The options and plugins

Works as expected:

$ echo '<strong>Bold Text</strong>' | html2md -i
**Bold Text**

$ echo '<strong>Bold Text</strong>' | html2md -i --opt-strong-delimiter="__"
__Bold Text__

$ echo '<ul><li><input type=checkbox checked>Checked!</li><li><input type=checkbox>Check Me!</li></ul>' | html2md -i -G
- [x] Checked!
- [ ] Check Me!

$ echo 'Only <del>blue ones</del> <s> left</s>' | html2md -i --plugin-strikethrough
Only ~~blue ones~~ ~~left~~

$ echo '<p>Lorem Ipsum:</p><p style="text-align: center;"><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/PifPVQOFyZI" title="YouTube video player" width="560"></iframe></p>' | ./html2md -i --plugin-youtube
Lorem Ipsum:

[![YouTube video player](https://img.youtube.com/vi/PifPVQOFyZI/0.jpg)](https://www.youtube.com/watch?v=PifPVQOFyZI)

Testing the new table plugins

$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i -T | head -6
| Firstname | Lastname | Age |
| --- | --- | --- |
| Jill | Smith | 50 |
| Eve | Jackson | 94 |
| Empty |  |  |
| End |

$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i -T --domain example.com | diff -wU 1 $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/output.table.golden -
---
@@ -41 +41,2 @@
 | `var` | b | c |
\ No newline at end of file
+

$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i --plugin-table-compat | head -6
Firstname · Lastname · Age

Jill · Smith · 50

Eve · Jackson · 94

$ cat $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/input.html | html2md -i --plugin-table-compat --domain example.com | diff -wU 1 $GOPATH/src/github.com/JohannesKaufmann/html-to-markdown/testdata/TestPlugins/table/output.tablecompat.golden -
---
@@ -41 +41,2 @@
 `var` · b · c
\ No newline at end of file
+

Credits

Credits

Similar Projects

Install Debian/Ubuntu package

sudo apt install -y html2md

Download/install binaries

The binary executables

tar -xvf html2md_*_linux_amd64.tar.gz
sudo mv -v html2md_*_linux_amd64/html2md /usr/local/bin/
rmdir -v html2md_*_linux_amd64

Distro package

The repo setup instruction url has been given above. For example, for Debian --

Debian package

curl -1sLf \
  'https://dl.cloudsmith.io/public/suntong/repo/setup.deb.sh' \
  | sudo -E bash

# That's it. You then can do your normal operations, like

sudo apt update
apt-cache policy html2md

sudo apt install -y html2md

Install Source

To install the source code instead:

go install github.com/suntong/html2md@latest

Author

Tong SUN
suntong from cpan.org

Powered by WireFrame
PoweredBy WireFrame
the one-stop wire-framing solution for Go cli based projects, from init to deploy.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

suntong
suntong

💻 🤔 🎨 🔣 ⚠️ 🐛 📖 📝 💡 🔧 📦 👀 💬 🚧 🚇
VPanteleev-S7
VPanteleev-S7

💻 🐛 📓
itdoginfo
itdoginfo

🐛 📓
somename123
somename123

🐛 🤔 📓
vivook
vivook

🐛 📓
097115
097115

🐛 🤔 📓
James Reynolds
James Reynolds

👀 📢 📓
ImportTaste
ImportTaste

💻 🐛 📓

This project follows the all-contributors specification. Contributions of any kind welcome!