Closed joelnitta closed 2 years ago
I think that a regex would be the pattern used, but I need to think about that because I'm not convinced about to parse the content. Seems a bit overkill for mdpo.
The current workaround is to use events.
Giving the file foo.md:
:::::::::::::: foo
Foo
::::::::::::::::
...and a file ignorer.py:
def on_text(md2po, block, text):
if text.startswith(':::'):
md2po.disable_next_block = True
If you execute md2po foo.md -e text:ignorer.py::on_text
, you'll get as output:
#
msgid ""
msgstr ""
#: foo.md:block 2 (paragraph)
msgid "Foo"
msgstr ""
See the reference for the exposed API of md2po
instance passed as first parameter of the event.
Thanks for the quick reply! At the moment, my workaround for both this and #228 is to pre-process the MD file, then generate the PO file from that. The pre-processing step excludes the YAML header and pandoc fenced divs by automatically adding <!-- mdpo-disable-next-line -->
etc before the relevant lines.
Eventually, it would be preferable for this to become part md2po
of so that I don't need an extra file (either ignorer.py
or the pre-processed MD).
(edit: sorry, I realized this didn't apply to the YAML header; in that case I post-process the resulting MD file to fix the YAML header)
Could you clarify what would be the exact behaviour of this? Would be the pattern a matcher for an entire Markdown block or just part of blocks? I'm not really sure what you're asking for, for example:
:::::::::::::: foo
Foo
::::::::::::::::
Should this hipotetical new option ignore the :::
parts of the paragraph? Or only when are defined in separate paragraphs?
:::::::::::::: foo
Foo
::::::::::::::::
Would an user will try to include in the matcher Markdown syntax, for example, including -
in list items? Because that is impossible to accomplish with the current parser, MD4C:
- foo
- ignore this
- bar
Should I define the value to match with ignore this
or with - ignore this
?
I see this request so much inclined to your use case and the implementation not clarily defined. If you can solve the problems stated I can consider it. Of course, you're always free to open a PR.
Sorry if it wasn't sufficiently clear...
I would not try to implement it on the paragraph level, but rather at the level of lines: If a line contains the matched text, it would be excluded from the PO file.
So in that case both
:::::::::::::: foo
Foo
::::::::::::::::
and
:::::::::::::: foo
Foo
::::::::::::::::
would only present Foo
to the PO file.
For your second example, let me explain with pseudo-code. I imagine something like this:
md2po input.md --exclude_lines -
And it would exclude all of the lines in
- foo
- ignore this
- bar
because each contains -
.
MD4C (the parser that uses mdpo based in the CommonMark spec) does not parse line by line but block by block, so I can't implement this and I have no motivation to create a low-level line by line Markdown parser that maintains the necessary speed.
There is a PR opened in MD4C to implement the syntax part of the parsing but its author hasn't the motivation to end and maintain it. As always, PRs are welcome in MD4C, PyMD4C and mdpo.
I see, thanks for explaining.
I was under the impression that exclusion / inclusion could be controlled line-by-line because of the existence of both
<!-- mdpo-disable-next-block -->
or <!-- mdpo-disable-next-line -->
and
<!-- mdpo-enable-next-block -->
or <!-- mdpo-enable-next-line -->
Are each of those pairs aliases? In other words, do they all only exclude/enable by block, not by line?
Yes, <!-- mdpo-disable-next-line -->
is just syntactic sugar for <!-- mdpo-disable-next-block -->
. See #211
MD4C (the parser that uses mdpo based in the CommonMark spec) does not parse line by line but block by block [...]
Does that mean if I had a similar request, for block exclusion, it would be a possibility?
I have some files that I want to include only paragraphs and ignore all headers.
Is there a better way to do it than adding <!-- mdpo-disable-next-line -->
before all headers?
Does that mean if I had a similar request, for block exclusion, it would be a possibility?
Sure, PRs welcome.
Instead of disabling extraction by using comments in the md file, I would like to exclude lines by providing a pattern to
md2po
(probably some flavor of grep, but at least literal matching).Example use case: creating a PO file for an md file with pandoc fenced divs, such as this. I would like to be able to exclude all lines starting with three colons (
:::
).