retext-project / pymarkups

Wrapper around various text markups, used in ReText
https://pymarkups.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
35 stars 8 forks source link

Added support for asciidoc #17

Closed dkuhlman closed 2 years ago

dkuhlman commented 2 years ago

Here are changes that add support for converting Asciidoc files.

Could you please take a look. If you are interested and think it's good enough, please merge it.

I'm hoping that someday, Asciidoc support will be added to ReText.

If you are interested in adding this, here are additional comments and questions:

Reporting errors and warnings

Asciidoc exceptions (errors and warnings) are written to stderr. Does that work for other use cases? Is there something else that I should do with them.

Extra required modules

The added code in support of Asciidoc uses the packages: (1) asciidoc3, (2) bs4 (BeautifulSoup4), and (3) lxml.

Need to add bs4 to settings.conf? I'm not sure how to specify dependencies.

Coding conventions

I've tried to follow the conventions in the other .py files. Except that I've used 4 spaces for indentation rather than a tab character.

PEP 8 says this:

Tabs or Spaces?

Spaces are the preferred indentation method.
Tabs should be used solely to remain consistent with code that is
already indented with tabs.
Python disallows mixing tabs and spaces for indentation.

So the question is whether we want consistency within each file or across all files.

Do you have an opinion on this? I can use the command line tools expand/unexpand to convert the new Asciidoc support code to use tabs for indentation, if that's your preference. My preference is to use the standard 4 spaces indent.

Unit tests

I added unit test for Asciidoc.

When I run them, I see several warnings: DeprecationWarning. I can hide them by running the following::

python -Wignore::DeprecationWarning -m unittest test_asciidoc
dkuhlman commented 2 years ago

Dmitry,

Thanks for your comments.

I've made most of the changes that you suggested. See more detailed notes, below. I've pushed these changes to Github.

What I've done:

Dave

dkuhlman commented 2 years ago

Dmitry,

I believe I have addressed the items in your recent suggestions.

I have not yet pushed the changes to Github.

First, a question:

In unit tests, re-worked the test for stylesheet in a attempt to make it less brittle. I'm thinking we should be testing our code in markups to ensure that it works. We should not be testing the code in asciidoc3 for changes in that code. As long as we have captured the title text, body text, and stylesheet text, our code should pass the unit test. So, I've replaced the tests with these:

self.assertTrue(len(body) > 100)
self.assertEqual(title_expected, title)
self.assertTrue(len(stylesheet) > 100)

What do you think?

Dave

dkuhlman commented 2 years ago

Dmitry,

I believe that I have addressed all of you second round of comments/suggestions.

I have pushed these changes to my fork of pymarkups. So, they should show up in the pull request.

Some details --- What I've done:

Dave

dkuhlman commented 2 years ago

Dmitry,

Thanks for these suggestions. I've pushed my changes to my fork.

What I have done:

About the copyright notice -- It's fine with me to leave it as is. You have done so much more work on this project than I have and they are really you ideas. I suggest that you leave it as is. But, I'll leave it up to you. If you do add me, I'd be pleased. My name would be "Dave Kuhlman".

Dave

codecov[bot] commented 2 years ago

Codecov Report

Merging #17 (1333bee) into master (c13ae66) will increase coverage by 0.29%. The diff coverage is 97.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #17      +/-   ##
==========================================
+ Coverage   95.06%   95.36%   +0.29%     
==========================================
  Files          10       12       +2     
  Lines         608      690      +82     
==========================================
+ Hits          578      658      +80     
- Misses         30       32       +2     
Impacted Files Coverage Δ
markups/asciidoc.py 96.15% <96.15%> (ø)
markups/__init__.py 87.50% <100.00%> (+0.32%) :arrow_up:
tests/test_asciidoc.py 100.00% <100.00%> (ø)
tests/test_public_api.py 94.11% <100.00%> (+0.56%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c13ae66...1333bee. Read the comment docs.

mitya57 commented 2 years ago

I pushed some commits, hope you don't mind.

Now the code looks good, but I have some remaining questions:

  1. Why did you choose asciidoc3 and not https://asciidoc-py.github.io/ aka https://asciidoc.org/? It is the first Google result for “AsciiDoc”, and its website suggests that AsciiDoc3 is unofficial implementation:

    The AsciiDoc3 implementation you can find on this website asciidoc3.org is a personal project.

    And what I like about that implementation is:

    • Changelog for version 10.0.0 says:

      Importing asciidoc should no longer require the asciidocapi.py script, and can be done through regular python import, e.g. import asciidoc; asciidoc.execute(...).

      This means we no longer need to import two different modules, add __path__ hack, etc.

    • It is possible to disable the "Last modified" footer with footer-style=none attribute. Asciidoc3 does not have such an attribute.

  2. Will you mind if I rename AsciidocMarkup to AsciiDocMarkup? That capitalization looks more correct.

  3. What do you think about using Safe Mode? Having sys: syntax which allows executing arbitrary commands seems like a security hole (imagine if someone sends a malicious file to a ReText user).

I will be happy to push the remaining changes myself once you answer these questions.

dkuhlman commented 2 years ago

Dimtry,

Renaming AsciidocMarkup sounds fine to me.

I don't understand the save-mode support, but it sounds like a good idea.

asciidoc-py also sounds like a reasonable option. I've made changes in my local repository so that we use it. I was also able to suppress the footer as you suggested. We can switch to asciidoc-py later if you like.

I used asciidoc3 originally because:

Both asciidoc-py and asciidoc3 seem to be actively supported.

If you feel better with asciidoc-py and since I can't give any reasons for using one over the other, we can switch to it. Tell me if/when you want my patches for asciidoc-py.

Dave

dkuhlman commented 2 years ago

Here is a diff against my fork containing my changes in support of asciidoc-py.

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 5190acc..cf1427c 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -18,7 +18,7 @@ jobs:
         run: |
           python -m pip install Markdown python-markdown-math pymdown-extensions
           python -m pip install docutils textile pygments codecov PyYAML
-          python -m pip install lxml asciidoc3
+          python -m pip install lxml asciidoc
       - name: Install the project
         run: python setup.py install
       - name: Run tests
diff --git a/markups/asciidoc.py b/markups/asciidoc.py
index bfd5539..eeaefc1 100644
--- a/markups/asciidoc.py
+++ b/markups/asciidoc.py
@@ -27,27 +27,26 @@ class AsciidocMarkup(AbstractMarkup):
     @staticmethod
     def available():
         try:
-            importlib.import_module('asciidoc3')
+            importlib.import_module('asciidoc')
             importlib.import_module('lxml')
         except ImportError:
             return False
         return True

     def convert(self, text):
-        import asciidoc3
-        from asciidoc3 import asciidoc3api
+        import asciidoc
         from lxml import etree
         outfile = StringIO()
         infile = StringIO(text)
-        converter = asciidoc3api.AsciiDoc3API(
-            list(asciidoc3.__path__)[0] + '/asciidoc3.py')
-        converter.attributes['newline'] = r'\n'
-        try:
-            converter.execute(infile, outfile)
-        except asciidoc3api.AsciiDoc3Error:
-            # Note: asciidoc3 reports/prints these errors for us.
-            # We catch the exception to avoid multiple reports.
-            pass
+        opts = [
+            ('--out-file', outfile),
+            ('--backend', 'xhtml11'),
+            # uncomment next line to suppress footer.
+            # ('-a', 'footer-style=none'),
+            ('--safe', None),
+        ]
+        cmd = 'asciidoc'
+        asciidoc.execute(cmd, opts, [infile])
         result = outfile.getvalue()
         result = result.encode()
         parser = etree.HTMLParser()
diff --git a/setup.cfg b/setup.cfg
index 0229089..c855252 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -37,7 +37,7 @@ markdown = Markdown>=2.6; PyYAML
 restructuredtext = docutils
 textile = textile
 highlighting = Pygments
-asciidoc = asciidoc3; lxml
+asciidoc = asciidoc; lxml

 [options.entry_points]
 pymarkups =
mitya57 commented 2 years ago

Sorry, I accidentally pushed tip of this repo's master to your branch and GitHub auto-closed this PR.

Anyway, these changes are merged as 363181ba2f81127b37d5582e2e33b44cc73d9f4e. Thank you for your patience!