jarontai / html2md

A library for converting HTML to Markdown in Dart. It supports CommonMark, simple table and custom converting rules. 将html转换为markdown的Dart库,支持CommonMark、简单表格以及自定义转换规则。
https://pub.dev/packages/html2md
BSD 2-Clause "Simplified" License
58 stars 25 forks source link

<pre> element not converted to Markdown code #43

Closed jnerlich closed 8 months ago

jnerlich commented 9 months ago

Hi, I'm using your library to convert Wiki pages to Markdown, for example :https://wiki.eclipse.org/Version_Numbering

If a \

 block is found, it is not converted into a Markdown code block. Example:

<pre>
First development stream
 - 1.0.0

Second development stream
 - 1.0.100 (indicates a bug fix)
 - 1.1.0 (a new API has been introduced)
 The plug-in ships as 1.1.0
</pre> 

Do I need to use a code rule for this, or is this a missing feature / configuration?

jarontai commented 9 months ago

https://daringfireball.net/projects/markdown/syntax#precode The code block is composed of pre and code tags in regular markdown.

Maybe you can add a custom rule, here are the builtin code block rule you can refer to: https://github.com/jarontai/html2md/blob/master/lib/src/rules.dart#L199

jnerlich commented 9 months ago

Thanks. Sorry for the silly question but getStyleOption is shown as error.

I tried to define the rule as follows:

final Rule indentedCodeBlock = Rule('indentedCodeBlock', filterFn: (node) {
  return getStyleOption('codeBlockStyle') == 'indented' &&
      node.parentElName == 'pre';
}, replacement: (content, node) {
  var children = node.childNodes().toList();
  if (children.length == 1) {
    return '\n\n    ' +
        children.first.textContent.replaceAll(RegExp(r'\n'), '\n    ') +
        '\n\n';
  } else {
    var result = '\n\n    ';
    for (var child in children) {
      var text = child.textContent;
      if (child != children.last) {
        text = text.replaceAll(RegExp(r'\n'), '\n    ');
      }
      result += text;
    }
    return result + '\n\n';
  }
});

Also copying your import statement does not compile:

import 'package:html2md/html2md.dart';
import 'package:http/http.dart' as http;
import 'package:html/parser.dart' as htmlParser;
import 'package:html/dom.dart' as htmlDom;
import 'package:html2md/html2md.dart' as html2md;
import 'dart:io';

import 'options.dart' show getStyleOption;
jarontai commented 8 months ago

getStyleOption('codeBlockStyle') == 'indented' is for builtin rule, you can remove it.

jnerlich commented 8 months ago

Thanks, that works for me. Thanks for your tooling and your answers.

jnerlich commented 8 months ago

Closing