Closed hassanakbar4 closed 3 years ago
@{"email"=>"ietf@augustcellars.com", "name"=>nil, "username"=>nil} commented
I have made a fix local to my system for this in parser.py about line 440
OLD self.text = six.binary_type(open(self.source, "rU").read(), 'utf8')
NEW self.text = six.binary_type(open(self.source, "rU", encoding='utf8').read(), 'utf8')
This will make the file be read as utf-8 even if there is not a BOM character present in the file. It may be that this needs to be pushed back into the Python2 code as well, but it seems to work just fine based on testing several files.
@{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented
What's relevant should be the XML declaration, and UTF-8 with or without BOM should work even without. See XML spec... A proper XML parser ought to deal with all these cases...
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented
Yes, I suspect this is an incorrect bug report. There's a test to read a unicode xml file in the xml2rfc test suite, and the test suite is run for python 2.7, 3.3, 3.4, and 3.5 using tox
before releases (see xml2rfc/trunk/cli/tox.ini
).
I think the proposed fix will break xml2rfc under python3 if it is given files with an xml declaration that specifies for instance encoding="latin-1"
; that is, any encoding different from ascii and utf-8.
@{"email"=>"ietf@augustcellars.com", "name"=>nil, "username"=>nil} uploaded file draft-ietf-quic-http.md
(62.6 KiB)
@{"email"=>"ietf@augustcellars.com", "name"=>nil, "username"=>nil} uploaded file draft-ietf-quic-http.xml
(109.8 KiB)
@{"email"=>"ietf@augustcellars.com", "name"=>nil, "username"=>nil} commented
I have added the .md file which sourced the failing .xml file. It has double quotes that are angled rather than straight. The fix I gave makes this file run.
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented
Ok, but when running with these combinations:
the provided xml file, with unicode double-quotes, works as-is without any fix.
With which xml2rfc / python / OS versions do you see this failing?
@{"email"=>"ietf@augustcellars.com", "name"=>nil, "username"=>nil} commented
I am running python 3.6.3 on windows. I do not know what version is running on the Circle CI work that Martin is doing. https://circleci.com/gh/quicwg/base-drafts/3856?utm_campaign=build-failed&utm_medium=email&utm_source=notification
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented
Ok. xml2rfc version and windows version, please?
It would also be good to have the exact failure output.
@{"email"=>"martin.thomson@gmail.com", "name"=>nil, "username"=>nil} commented
The docker image I used is here: https://hub.docker.com/r/martinthomson/i-d-template/builds/bfea63z4fkjwv6bkacjbfyk/
As you can see, this is using python 3.5.2 on ubuntu 16.04 with xml2rfc 2.8.2.
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented
Replying to hassanakbar4/tractive-test#338 (comment:8):
The docker image I used is here: https://hub.docker.com/r/martinthomson/i-d-template/builds/bfea63z4fkjwv6bkacjbfyk/
As you can see, this is using python 3.5.2 on ubuntu 16.04 with xml2rfc 2.8.2.
Ahh. Splendid. Together with additional info from Jim, this makes me believe the problem could be related to environmental settings, rather than OS or python version. Could you try this patch, please:
Index: xml2rfc/parser.py
===================================================================
--- xml2rfc/parser.py (revision 2395)
+++ xml2rfc/parser.py (working copy)
@@ -437,7 +437,7 @@
if six.PY2:
self.text = open(self.source, "rU").read()
else:
- self.text = six.binary_type(open(self.source, "rU").read(), 'utf8')
+ self.text = open(self.source, "rUb").read()
# Get an iterating parser object
file = six.BytesIO(self.text)
@{"email"=>"martin.thomson@gmail.com", "name"=>nil, "username"=>nil} commented
That change worked for me.
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed resolution from ` to
fixed`
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented
Excellent.
Fixed in [2396]:
Changed the python 3 code that reads in an xml file to read as binary, in order to not run into issues with unicode conversion before we have had time to look at the encoding attribute of the
I've released 2.8.3 with this fix.
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed _comment0 which not transferred by tractive
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed status from new
to closed
component_Version 2 cli
resolution_fixed
type_defect
| by ietf@augustcellars.comIn python3 if there is no BOF marker at the start of the file, but the file is still UTF-8 and not ASCII then there is an error when the first UTF-8 character is reached.
Issue migrated from trac:338 at 2021-10-20 18:25:17 +0500