Closed friedelwolff closed 5 years ago
Could you please add test case for you changes?
I extended dump_file.py to be able to dump the file in raw order. I also added a test document that will clearly show the difference between the two approaches.
I realise now I changed the signature for the class Document. Should we rather update it so that it doesn't require an update to call sites?
Hi,
so if I understand correctly, what you need is really a 'raw stream of characters' as they appear on the page - Than I think some more appropriate name for class would be appropriate
I.
On 20/01/17 14:00, friedelwolff wrote:
y the current "word" information from the stream, so it isn't really a WordList (as the name suggest). Ma
This provides a WordList in the raw order (content stream order) of the document.