Closed tomschr closed 9 years ago
Short notice for myself:
I. Creational Design Patterns Concerned about how objects are created.
II. Structural Design Patterns Concerned about how objects are composed together to form new and bigger objects.
II. Behavioral Design Patterns Concerend about how things get done: algorithms and object interactions.
It turns out, that lxml's .sourceline
returns only the end of the start tag:
<?xml version="1.0"?>
<!DOCTYPE article
[ ...
]>
<article version="5.0" xml:lang="en"
xmlns:dm="urn:x-suse:ns:docmanager"
xmlns="http://docbook.org/ns/docbook"
xmlns:xlink="http://www.w3.org/1999/xlink">
In the above example, we need line 5, but .sourceline returns line 8 which is wrong. See thread in https://mailman-mail5.webfaction.com/pipermail/lxml/2015-May/007518.html
An alternative implementation of root_sourceline()
function can be found in https://gist.github.com/tomschr/6ecaaf69231dfbc9517a
This can be considered as fixed, see a16a512894b8c1806c6b0cf49e19bd39fc0c5f59
@tomschr Can we delete branch issue23?
The new findprolog
function is committed in f30efb00017ca and 2ef48eeaa and should fix this problem.
When parsing a DocBook5 file with an internal subset, all pointers to external entities are resolved. This makes too many diffs.
The preferred solution would be to have an unmodified DOCTYPE header and the only change are the
info
anddocmanager
elemens.Possible solutions:
Read the complete XML file as textRemove the DOCTYPE header and save it somewhere elsePass result from step 2 into XML parserDo whatever we need to do.Concat header from step 2 and result from 4 and save it.This could be done by a proxy pattern.