Closed nickbabcock closed 10 years ago
You could save the position of the stream, reset it to 0, parse the file, then restore the position. You would not be able to dispose or close your StreamReader, though.
Good point. I did a quick run of this earlier and it didn't work, but you inspired me to take another look and I think I may have found the trick.
using (var fs = new FileStream( /*...*/ ))
using (var sr = new StreamReader(fs, /* ... */))
{
string line = sr.ReadLine();
// Because paradox files are encoded with windows code page (1252)
// the number of bytes read != number of characters read.
// Can't use line.Length as that is the number of characters and not bytes.
int count = Encoding.GetEncoding(1252).GetByteCount(line)
fs.Seek(count, SeekOrigin.Begin)
ParadoxParser.Parse(fs, /* ... */)
}
Problems
TextReader
with the windows code page? They could just use the length of the line they read as the number of bytes, but they run the risk of mis-reading text.Definitely something to think about.
EDIT: just realized you were talking about something slightly different and making two passes at the file.
You can get the position directly from the stream object. Like so:
using (var fs = new FileStream( /*...*/ ))
using (var sr = new StreamReader(fs, /*...*/))
{
string line = sr.ReadLine();
// Store current position.
long pos = fs.Position;
// Move to beginning of stream.
fs.Position = 0;
ParadoxParser.Parse(fs, /*...*/);
// Reset position.
fs.Position = pos;
}
You need to make sure the stream supports it by checking the CanSeek property.
I feel like you might be really close to something, let me clarify with an example:
EU4 savegames now have a first line of EU4txt
followed by the traditional structure of the same. The problem is that the EU4txt
doesn't correspond to any defined structures in the parser. It is, essentially, a special first line. Thus I want to read the first line (maybe do some checking on it) and then start the parser on the next line. The problem is that fs.Position
will return the next buffered size block of the stream reader.
For instance on the previous example the following code:
Console.WriteLine(fs.Position)
Console.WriteLine(sr.ReadLine())
Console.WriteLine(fs.Position)
will print:
0
EU4txt
4096
Obviously, we don't want to start the parser on byte 4096 (the default buffer size for a StreamReader
), but rather on byte 7 or 8.
Any ideas?
Well, this code is rather hacky, but it does work:
using (var fs = new FileStream( /*...*/ ))
using (var sr = new StreamReader(fs, /*...*/))
{
string line = sr.ReadLine();
// Store current position. Note: CurrentEncoding only works after reading.
long pos = sr.CurrentEncoding.GetByteCount(line);
// Move to end of read line.
fs.Position = pos;
// Read bytes until it's not a new line character.
int nextChar;
do
{
nextChar = fs.ReadByte();
} while (nextChar == '\r' || nextChar == '\n');
// Move back 1 character.
fs.Position--;
ParadoxParser.Parse(fs, /*...*/);
}
If the parser can handle starting with an empty line, then you don't need to do the do-while loop or subtract from fs.Position.
In general the parser is really robust and so it will handle empty lines. In fact, the parser detects that the first line is EU4txt
but the applications of what we are discussing is more far reaching.
The code example you showed, boils down to what I showed in https://github.com/nickbabcock/Pdoxcl2Sharp/issues/17#issuecomment-32061151, the only difference being fs.Position = pos
vs fs.Seek(pos, SeekOrigin.Begin)
But it looks like the solution thus far is to push this issue out to whoever is using the parser, and to not try and support reading from TextReader
in the parser, am I correct in your thoughts?
Given the issue with encoding you are correct, I would say that only support streams would be better.
Currently the parser will only 'work' if given a brand new stream, which it will construct its own
StreamReader
around it.Here is an example of what can go wrong in a client program.
Since
StreamReader
is buffered, it will consume a lot more of the underlying stream. Therefore if fs passed toParse
it will start reading a lot farther in advance of the first line. This is not desired. The client should have the option of passing in a buffered text reader. The one problem with this is that we don't have control over setting the encoding of the text reader, which may be a problem.