sliekens / Txt

A text parsing framework for .NET.
MIT License
2 stars 4 forks source link

Binary data #8

Open sliekens opened 9 years ago

sliekens commented 9 years ago

At this time, the ITextScanner interface does not provide APIs for binary data.

The workarounds are:

  1. Use OctetLexer to read any number of bytes as instances of Element
  2. Read directly from the underlying stream

The first workaround is nasty for a number of reasons:

The second workaround isn't much better:

Suggested fix: copy the Read and ReadByte + async variants from System.IO.Stream to the ITextScanner interface. Implement these methods in a way that updates the scanner's internal state.

sliekens commented 9 years ago

The suggested fix has its own issues...

Suggested alternative fix: add a property ITextScanner.BaseStream. The implementation of that property should return a wrapper around the underlying stream. The wrapper should implement the following behavior:

sliekens commented 9 years ago

More ideas...

Instead of the BaseStream property, add a method ITextscanner.ReadRaw(Action<Stream>)

The implementation of this method is responsible for managing the lifetime of objects

void ReadRaw(Action<Stream> callback)
{
    using (Stream wrapper = new WrapStream(inputstream, this))
    {
        callback(wrapper);
    }
}

This way, the scanner object has full control over all objects: the wrapper stream becomes unusable after the callback method returns.

TODO: figure out what pattern that the WrapStream should use for notifying the scanner object when it is read.

sliekens commented 8 years ago

Since ReadRaw(callback) lets the caller read data out of context, which may be binary data or character data in a different encoding, is there any meaningful way to maintain the scanner's internal state?

I think that a ReadRaw action should always force the scanner back to its pre-initialized state.

Discarding internal state is the only way to prevent integer overflow when the number of raw bytes read is greater than int.MaxValue (2GB).

sliekens commented 8 years ago

I added two members to ITextScanner

  1. ITextScanner.Reset()
  2. ITextScanner.BaseStream

The Reset() method sets the internal state to the pre-initialized state and releases any internal buffers that it may hold. This method should always be safe to call in between reads.

The BaseStream property returns a direct reference to the underlying stream. Callers should take care not to dispose this stream, and to call Reset() before attempting to read from this stream.