Closed dkuhlman closed 2 years ago
Dmitry,
Thanks for your comments.
I've made most of the changes that you suggested. See more detailed notes, below. I've pushed these changes to Github.
What I've done:
About using a different parser -- I've switched from bs4
to
lxml
, which can be installed with pip
(pip install lxml
).
I modified method available
so that it also checks for importing
lxml
.
I hope this is acceptable. I cannot find a parser in the Python
standard distribution that supports HTML, except the one you
suggested (html.parser
), which is a SAX style event based parser
and, so, is not usable for my needs (specifically, extracting the
body, title, stylesheet, etc.
Moved that import of asciidoc3
from module level to inside the
functions that use it.
Removed references to settings_overrides
and self.overrides
Copied the contents of tests/test_asciidoc_data.py
into
tests/test_ascidoc.py
and removed file
tests/test_asciidoc_data.py
.
Removed functions/tests test_whole_html
and test_toc_backrefs
from file tests/test_asciidoc.py
.
Checked code with flake8. It uses PEP-8, I believe.
Reverted markup2html.py
to the revision with your last change.
I did not intend to change that file. I apologize.
Dave
Dmitry,
I believe I have addressed the items in your recent suggestions.
I have not yet pushed the changes to Github.
First, a question:
In unit tests, re-worked the test for stylesheet in a attempt to
make it less brittle. I'm thinking we should be testing our code in markups
to ensure that it works. We should not be testing the code in
asciidoc3
for changes in that code. As long as we have captured
the title text, body text, and stylesheet text, our code should
pass the unit test. So, I've replaced the tests with these:
self.assertTrue(len(body) > 100)
self.assertEqual(title_expected, title)
self.assertTrue(len(stylesheet) > 100)
What do you think?
Dave
Dmitry,
I believe that I have addressed all of you second round of comments/suggestions.
I have pushed these changes to my fork of pymarkups
. So, they
should show up in the pull request.
Some details --- What I've done:
Removed use of self.converter_class
.
Simplified use of StringIO
.
Eliminated storing head in ConvertedAsciidoc constructor.
Captured multiple Javascript elements, although I believe there is only one. Better to be safe.
In unit tests, re-worked the test for stylesheet in a attempt to
make it less brittle. We should be testing the code in markups
to ensure that it works. We should not be testing the code in
asciidoc3
for changes in that code. Our unit tests should not
fail if changes/improvements are made to the stylesheet and HTML
output produced by asciidoc3
.
Do you agree?
As long as we have captured the title text, body text, and stylesheet text, our code should pass the unit test. So, I've replaced the tests with these:
self.assertTrue(len(body) > 100) self.assertEqual(title_expected, title) self.assertTrue(len(stylesheet) > 100)
What do you think?
Changed globals in tests/test_asciidoc.py
to all upper case. In
particular: Changed global Basic_text
to BASIC_TEXT
.
Removed commented out debug code.
Added install of dependencies to .github/workflows/main.yml
:
python -m pip install lxml asciidoc3
Dave
Dmitry,
Thanks for these suggestions. I've pushed my changes to my fork.
What I have done:
Changed unit test code to use `self.assertGreater(...) -- 2 instances.
Removed constructor method (__init__
) from class
AsciidocMarkup
.
Removed class ConvertedAsciidoc
. Returned instance of
ConvertedMarkup
instead.
removed import re
in test_asciidoc.py
-- Not used.
About the copyright notice -- It's fine with me to leave it as is. You have done so much more work on this project than I have and they are really you ideas. I suggest that you leave it as is. But, I'll leave it up to you. If you do add me, I'd be pleased. My name would be "Dave Kuhlman".
Dave
Merging #17 (1333bee) into master (c13ae66) will increase coverage by
0.29%
. The diff coverage is97.59%
.
@@ Coverage Diff @@
## master #17 +/- ##
==========================================
+ Coverage 95.06% 95.36% +0.29%
==========================================
Files 10 12 +2
Lines 608 690 +82
==========================================
+ Hits 578 658 +80
- Misses 30 32 +2
Impacted Files | Coverage Δ | |
---|---|---|
markups/asciidoc.py | 96.15% <96.15%> (ø) |
|
markups/__init__.py | 87.50% <100.00%> (+0.32%) |
:arrow_up: |
tests/test_asciidoc.py | 100.00% <100.00%> (ø) |
|
tests/test_public_api.py | 94.11% <100.00%> (+0.56%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c13ae66...1333bee. Read the comment docs.
I pushed some commits, hope you don't mind.
Now the code looks good, but I have some remaining questions:
Why did you choose asciidoc3 and not https://asciidoc-py.github.io/ aka https://asciidoc.org/? It is the first Google result for “AsciiDoc”, and its website suggests that AsciiDoc3 is unofficial implementation:
The AsciiDoc3 implementation you can find on this website asciidoc3.org is a personal project.
And what I like about that implementation is:
Changelog for version 10.0.0 says:
Importing asciidoc should no longer require the
asciidocapi.py
script, and can be done through regular python import, e.g.import asciidoc; asciidoc.execute(...)
.
This means we no longer need to import two different modules, add __path__
hack, etc.
It is possible to disable the "Last modified" footer with footer-style=none
attribute. Asciidoc3 does not have such an attribute.
Will you mind if I rename AsciidocMarkup
to AsciiDocMarkup
? That capitalization looks more correct.
What do you think about using Safe Mode? Having sys:
syntax which allows executing arbitrary commands seems like a security hole (imagine if someone sends a malicious file to a ReText user).
I will be happy to push the remaining changes myself once you answer these questions.
Dimtry,
Renaming AsciidocMarkup
sounds fine to me.
I don't understand the save-mode support, but it sounds like a good idea.
asciidoc-py
also sounds like a reasonable option.
I've made changes in my local repository so that we use it.
I was also able to suppress the footer as you suggested.
We can switch to asciidoc-py
later if you like.
I used asciidoc3
originally because:
asciidoc-py
has one too, of course)Both asciidoc-py
and asciidoc3
seem to be actively supported.
If you feel better with asciidoc-py
and since I can't give any
reasons for using one over the other, we can switch to it.
Tell me if/when you want my patches for asciidoc-py
.
Dave
Here is a diff against my fork containing my changes in support of asciidoc-py
.
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 5190acc..cf1427c 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -18,7 +18,7 @@ jobs:
run: |
python -m pip install Markdown python-markdown-math pymdown-extensions
python -m pip install docutils textile pygments codecov PyYAML
- python -m pip install lxml asciidoc3
+ python -m pip install lxml asciidoc
- name: Install the project
run: python setup.py install
- name: Run tests
diff --git a/markups/asciidoc.py b/markups/asciidoc.py
index bfd5539..eeaefc1 100644
--- a/markups/asciidoc.py
+++ b/markups/asciidoc.py
@@ -27,27 +27,26 @@ class AsciidocMarkup(AbstractMarkup):
@staticmethod
def available():
try:
- importlib.import_module('asciidoc3')
+ importlib.import_module('asciidoc')
importlib.import_module('lxml')
except ImportError:
return False
return True
def convert(self, text):
- import asciidoc3
- from asciidoc3 import asciidoc3api
+ import asciidoc
from lxml import etree
outfile = StringIO()
infile = StringIO(text)
- converter = asciidoc3api.AsciiDoc3API(
- list(asciidoc3.__path__)[0] + '/asciidoc3.py')
- converter.attributes['newline'] = r'\n'
- try:
- converter.execute(infile, outfile)
- except asciidoc3api.AsciiDoc3Error:
- # Note: asciidoc3 reports/prints these errors for us.
- # We catch the exception to avoid multiple reports.
- pass
+ opts = [
+ ('--out-file', outfile),
+ ('--backend', 'xhtml11'),
+ # uncomment next line to suppress footer.
+ # ('-a', 'footer-style=none'),
+ ('--safe', None),
+ ]
+ cmd = 'asciidoc'
+ asciidoc.execute(cmd, opts, [infile])
result = outfile.getvalue()
result = result.encode()
parser = etree.HTMLParser()
diff --git a/setup.cfg b/setup.cfg
index 0229089..c855252 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -37,7 +37,7 @@ markdown = Markdown>=2.6; PyYAML
restructuredtext = docutils
textile = textile
highlighting = Pygments
-asciidoc = asciidoc3; lxml
+asciidoc = asciidoc; lxml
[options.entry_points]
pymarkups =
Sorry, I accidentally pushed tip of this repo's master to your branch and GitHub auto-closed this PR.
Anyway, these changes are merged as 363181ba2f81127b37d5582e2e33b44cc73d9f4e. Thank you for your patience!
Here are changes that add support for converting Asciidoc files.
Could you please take a look. If you are interested and think it's good enough, please merge it.
I'm hoping that someday, Asciidoc support will be added to ReText.
If you are interested in adding this, here are additional comments and questions:
Reporting errors and warnings
Asciidoc exceptions (errors and warnings) are written to stderr. Does that work for other use cases? Is there something else that I should do with them.
Extra required modules
The added code in support of Asciidoc uses the packages: (1) asciidoc3, (2) bs4 (BeautifulSoup4), and (3) lxml.
Need to add bs4 to settings.conf? I'm not sure how to specify dependencies.
Coding conventions
I've tried to follow the conventions in the other
.py
files. Except that I've used 4 spaces for indentation rather than a tab character.PEP 8 says this:
So the question is whether we want consistency within each file or across all files.
Do you have an opinion on this? I can use the command line tools expand/unexpand to convert the new Asciidoc support code to use tabs for indentation, if that's your preference. My preference is to use the standard 4 spaces indent.
Unit tests
I added unit test for Asciidoc.
When I run them, I see several warnings: DeprecationWarning. I can hide them by running the following::