kunpeng9 / GTD2020-05-31

2020-05-31创建【将github的项目链接等放入滴答清单进行管理或者印象笔记等,实践证明都不可行,不好用,完全被搁置了】
The Unlicense
26 stars 5 forks source link

【pandoc-OPML 从 Markdown 生成 OPML 文件】edavis/pandoc-opml: Generate OPML files from Markdown #66

Open kunpeng9 opened 3 years ago

kunpeng9 commented 3 years ago

pandoc-opml generates OPML files from Markdown with the help of pandoc.

在 pandoc 的帮助下,pandoc-OPML 从 Markdown 生成 OPML 文件。

Demo 演示

Imagine this Markdown document:

想象一下这个 Markdown 文档:

---
title: Demo Document
author: Eric Davis
---

# Hello World!

This is a child of the "Hello World!" header.

After running it through pandoc-opml, you'd have this OPML document:

在 pandoc-OPML 中运行之后,就会得到这个 OPML 文档:

<?xml version="1.0" encoding="UTF-8"?>
<opml version="2.0">
  <!-- OPML generated by pandoc-opml v0.1 on Tue, 13 Jan 2015 04:21:33 GMT -->
  <head>
    <title>Demo Document</title>
    <ownerName>Eric Davis</ownerName>
    <dateModified>Tue, 13 Jan 2015 04:21:33 GMT</dateModified>
    <generator>https://github.com/edavis/pandoc-opml</generator>
    <docs>https://github.com/edavis/pandoc-opml#docs</docs>
  </head>
  <body>
    <outline level="1" name="hello-world" text="Hello World!">
      <outline text="This is a child of the &quot;Hello World!&quot; header."/>
    </outline>
  </body>
</opml>

Alright, so I've taken the simplicity of Markdown and turned it into a jumble of XML. What's so great about this?

好了,我已经把 Markdown 的简单性转化成了一堆混乱的 XML。这有什么好的?

Well, think of what an XML version of your Markdown now enables.

那么,想想 Markdown 的 XML 版本现在能够实现什么。

Say you wanted to grab all level 1 and level 2 headlines from a Markdown document to put together a table of contents.

假设你想从 Markdown 文档中抓取所有第一级和第二级的标题来组合成一个目录。

All the widely used Markdown libraries seem to focus primarily on transforming Markdown into HTML, so no help there. Beyond that, you could try writing a regex to extract the headers but that path is brittle and full of pain.

所有广泛使用的 Markdown 库似乎主要集中在将 Markdown 转换为 HTML,所以在这方面没有帮助。除此之外,您可以尝试编写一个正则表达式来提取头文件,但是这条路径非常脆弱,而且充满了痛苦。

What if instead you could transform your Markdown into XML and gain with it all the tools and libraries that natively work with XML? Then your "grab all level 1 and level 2 headers" task would be a breeze.

相反,如果您可以将 Markdown 转换为 XML,并使用它获得所有本机使用 XML 的工具和库,那会怎样呢?然后你的 “抓住所有的水平 1 和水平 2 头” 的任务将是一个轻而易举的。

pandoc-opml is the tool to do just that.

Pandoc-opml 就是这样的工具。

Installation 安装

I'll eventually toss this up on PyPI, but for now:

我最终会在 PyPI 上抛出这个问题,但是现在:

$ pip install https://github.com/edavis/pandoc-opml

The only external requirement is pandoc.

唯一的外部需求是 pandoc。

Running 跑步

$ pandoc-opml [-o <output.opml>] <input.txt>

If -o/--output is not provided, the output is written to stdout.

如果没有提供 - o/-- 输出,则将输出写入 stdout。

Docs 文件

pandoc-opml makes every effort to follow the OPML v2.0 specification as closely as possible.

Pandoc-OPML 尽最大努力遵循 OPML v2.0 规范。

However, Markdown is a rich format so some additional information about the source elements are stored as attributes.

但是,Markdown 是一种丰富的格式,因此关于源元素的一些附加信息被存储为属性。

A good OPML parser should ignore anything it doesn't understand, so none of this should be a problem. Please file a bug report if any problems do arise.

一个好的 OPML 解析器应该忽略它不理解的任何东西,所以这些应该都不是问题。如果出现任何问题,请提交错误报告。

Headers 头文件

The OPML of Markdown headline elements includes two attributes: level and name.

Markdown headline 元素的 OPML 包括两个属性: level 和 name。

The level attribute is the HTML level for the given header element. For example 1 for h1, 2 for h2, etc.

Level 属性是给定 header 元素的 HTML 级别。例如 1 表示 h 1,2 表示 h 2,等等。

The name attribute is the unique identifier assigned according to these rules.

Name 属性是根据这些规则分配的唯一标识符。

To override the name attribute, explicitly set the unique identifier:

要覆盖 name 属性,请显式设置唯一标识符属性:

# Hello World {#custom-id}
<outline level="1" name="custom-id" text="Hello World"/>

Attributes 属性

If you specify header attributes, pandoc-opml will include them in the resulting OPML:

如果您指定头部属性,pandoc-OPML 将在结果 OPML 中包含它们:

# Hello World {#custom-id .draft category=demo}
<outline level="1" name="custom-id" text="Hello World" draft="true" category="demo"/>

Class header attributes have the value of "true" while key/value header attributes are included as-is.

类头属性的值为 “true” ,而键 / 值头属性按原样包含。

Later attributes overwrite earlier ones. For example:

后面的属性覆盖了前面的属性,例如:

# Hello World {#unique-id .name name=example}

First, name=unique-id. Then, the class attribute sets name=true. Then, the key/value attribute sets name=example. In the resulting OPML, name will equal example.

首先,name = unique-id。然后,class 属性设置 name = true。然后,key/value 属性设置 name = example。在最终的 OPML 中,名称等于示例。

Lists 名单

Unordered list items have a list attribute set to unordered.

无序列表项的列表属性设置为无序。

Ordered list items have a list attribute set to ordered and an ordinal attribute set to the ordinal number of the list item.

有序列表项目有一个列表属性设置为有序,有一个序号属性设置为列表项目的序号。

Example:

例子:

- Hello World
- This is a test

1) Hello World
2) This is a test
<outline list="unordered" text="Hello World"/>
<outline list="unordered" text="This is a test"/>

<outline list="ordered" ordinal="1" text="Hello World"/>
<outline list="ordered" ordinal="2" text="This is a test"/>

Metadata 元数据

If description is included in the metadata, it is included as a <description> element in the OPML's <head>.

如果描述包含在元数据中,那么它作为一个 <description> 元素包含在 OPML 的 < head > 中。

If date is included, it is included as the <dateCreated> element in the OPML's <head>.

如果日期包含在内,它将作为 OPML 的 <head> 中的 < datecreated > 元素包含在内。

The <dateModified> element is the timestamp of when pandoc-opml created the OPML.

元素是 pandoc-OPML 创建 OPML 的时间戳。

All the other metadata (e.g., title, author, email, etc.) maps to the standard OPML <head> elements.

所有其他元数据 (例如,标题、作者、电子邮件等) 映射到标准的 OPML < head > 元素。

If more than one author is provided, a single <ownerName> element is created with the names comma delimited.

如果提供了多个作者,则创建一个名称以逗号分隔的单个 <ownername> 元素。

HTML 超文本标记语言

If the source Markdown contains formatting, the respective OPML text attribute will contain encoded HTML markup:

如果源 Markdown 包含格式,那么相应的 OPML 文本属性将包含编码的 HTML 标记:

This paragraph contains *emphasis* and **strong** formatting along
with `code` and H~2~O (subscripts) and 2^10^ (superscripts) and last,
but not least, ~~deleted text~~.
<outline text="This paragraph contains &lt;em&gt;emphasis&lt;/em&gt; and &lt;strong&gt;strong&lt;/strong&gt; formatting along with &lt;code&gt;code&lt;/code&gt; and H&lt;sub&gt;2&lt;/sub&gt;O (subscripts) and 2&lt;sup&gt;10&lt;/sup&gt; (superscripts) and last, but not least, &lt;del&gt;deleted text&lt;/del&gt;."/>

Background 背景

I've long been interested in OPML as a file format, but I was always more comfortable using a text editor than any of the available OPML editors.

我一直对作为文件格式的 OPML 很感兴趣,但是使用文本编辑器比使用任何可用的 OPML 编辑器都要舒服。

So I started toying with the idea of using a regular text editor and exporting plain text files to OPML instead of editing OPML directly.

所以我开始考虑使用普通的文本编辑器,将纯文本文件导出到 OPML,而不是直接编辑 OPML。

I knew the hardest part was going to be parsing the plain text input files. Looking for alternatives to writing that code myself, I found pandoc and was thrilled to see it provided access to the abstract syntax tree (AST) that represented the input file's headers, paragraphs, list items, etc. Plus, by using pandoc, I could write the input files in any of the many file formats it understands.

我知道最困难的部分是解析纯文本输入文件。在寻找替代自己编写代码的方法时,我找到了 pandoc,看到它提供了对代表输入文件头、段落、列表项等的抽象语法树文件集 (AST) 的访问,我激动不已。另外,通过使用 pandoc,我可以使用它能够理解的任何文件格式编写输入文件。 https://github.com/edavis/pandoc-opml