openSUSE / docmanager

Manage DocBook 5 Meta Information
http://opensuse.github.io/docmanager/index.html
GNU General Public License v3.0
6 stars 6 forks source link

Fix entity problem #23

Closed tomschr closed 9 years ago

tomschr commented 9 years ago

When parsing a DocBook5 file with an internal subset, all pointers to external entities are resolved. This makes too many diffs.

The preferred solution would be to have an unmodified DOCTYPE header and the only change are the info and docmanager elemens.

Possible solutions:

  1. Read the complete XML file as text
  2. Remove the DOCTYPE header and save it somewhere else
  3. Pass result from step 2 into XML parser
  4. Do whatever we need to do.
  5. Concat header from step 2 and result from 4 and save it.

This could be done by a proxy pattern.

tomschr commented 9 years ago

Short notice for myself:

I. Creational Design Patterns Concerned about how objects are created.

  1. Abstract factory pattern :-1:
  2. Builder pattern :-1:
  3. Factory method pattern
  4. Prototype pattern
  5. Singleton pattern

II. Structural Design Patterns Concerned about how objects are composed together to form new and bigger objects.

  1. Adapter pattern
  2. Bridge pattern
  3. Composite pattern
  4. Decorator pattern
  5. Facade pattern
  6. Flyweight pattern
  7. Proxy pattern

II. Behavioral Design Patterns Concerend about how things get done: algorithms and object interactions.

  1. Chain of responsibility pattern
  2. Command pattern
  3. Interpreter pattern
  4. Iterator pattern
  5. Mediator pattern
  6. Memento pattern
  7. Observer pattern
  8. State pattern
  9. Strategy pattern
  10. Template method pattern
  11. Visitor pattern

See also: http://www.netobjectivestest.com/PatternRepository//index.php?title=AdapterVersusProxyVersusFacadePatternComparison

tomschr commented 9 years ago

It turns out, that lxml's .sourceline returns only the end of the start tag:

<?xml version="1.0"?>
<!DOCTYPE article
[ ...
]>
<article version="5.0" xml:lang="en"
          xmlns:dm="urn:x-suse:ns:docmanager"
          xmlns="http://docbook.org/ns/docbook"
          xmlns:xlink="http://www.w3.org/1999/xlink">

In the above example, we need line 5, but .sourceline returns line 8 which is wrong. See thread in https://mailman-mail5.webfaction.com/pipermail/lxml/2015-May/007518.html

tomschr commented 9 years ago

An alternative implementation of root_sourceline() function can be found in https://gist.github.com/tomschr/6ecaaf69231dfbc9517a

tomschr commented 9 years ago

This can be considered as fixed, see a16a512894b8c1806c6b0cf49e19bd39fc0c5f59

mschnitzer commented 9 years ago

@tomschr Can we delete branch issue23?

tomschr commented 9 years ago

The new findprolog function is committed in f30efb00017ca and 2ef48eeaa and should fix this problem.