Confirm correctness of character offsets

Looking at the output example in docs/design/1-toplevel.html, which was created from the following input

<?xml version="1.0" ?>
<text>He sleeps on Friday.</text>

it appears that the start and end of the doc_element are wrong since we have <text id="1" begin="1" end="22" /> and <doc_element type="TarsqiDocParagraph" begin="0" end="23">. Given that the text tag spans all text, you expect the doc_element to be inside of the text tag.

But note that even though the text tag in the input seems to span all text, it still starts at position 1 in the file without the tags since we have the newline character after the xml declaration, and it ends one character before the end because of the newline after the closing text tag.

Here is a fragment from the output:

<text>
She sleeps on Friday.
</text>

The problem is that the text tags in the input and output are difference things. Yet the text tag in the input is the same as the <text id="1" begin="1" end="22" /> in the output.

This is a bit confusing. I wonder if I should add something to this effect in the documentation, or perhaps use a less common name than text for the tag that spans all source text, perhaps something like <primary_data>.

tarsqi / ttk

Confirm correctness of character offsets #15