nickbabcock / Pdoxcl2Sharp

A Paradox Interactive general file parser
MIT License
39 stars 13 forks source link

A New Kind of Parser #30

Open nickbabcock opened 10 years ago

nickbabcock commented 10 years ago

I've been thinking, and while I like the current parser, it does assume a lot. I think a step back is needed that allows for a more powerful and flexible parser for situations where the current parser is ill suited. A SAX-like parser would provide these advantages, and I think JSON.net highlights it perfectly. There, so far, is no pressing need for this parser and if implemented, this parser would only complement the current one and not replace it. Here is an example to better illustrate functionality.


string txt = @"EU4txt
player=VEN
date=""1492.1.1""
rivals = { ""YOU"" }#hehe, you're my rival
trade={
    money=10.312
}"

var stream = /* text into stream */
var reader = new ParadoxSaxReader(stream);

while (reader.Read())
{
    if (reader.Value != null)
        Console.WriteLine(
            "Token: {0}, Value: {1}, Line: {2}, Column: {3}, Quoted: {4}",
             reader.Token, reader.Value, reader.Line, reader.Column,
             reader.Quoted);
    else
        Console.WriteLine(
            "Token: {0}, Line: {1}, Column: {2}",
             reader.Token, reader.Line, reader.Column);
}

// Token: String, Value: EU4txt, Line: 1, Column: 1, Quoted: false
// Token: PropertyName, Value: player, Line 2, Column: 1, Quoted: false
// Token: String, Value: VEN, Line: 2, Column: 8, Quoted: false
// Token: PropertyName, Value: date, Line: 3, Column: 6, Quoted: false
// Token: String, Value: 1492.1.1, Line: 3, Column: 1, Quoted: true
// Token: PropertyName, Value: rivals, Line: 4, Column: 1, Quoted: false
// Token: StartObject, Line: 4, Column: 10
// Token: String, Value: YOU, Line: 4, Column: 12, Quoted: true
// Token: EndObject, Line: 4, Column: 18
// Token: Comment, Value: hehe, you're my rival, Line: 4, Column: 19
// Token: PropertyName, Value: trade, Line: 5, Column: 1, Quoted: false
// Token: StartObject, Line: 5, Column: 7
// Token: PropertyName, Value: money, Line: 6, Column: 5, Quoted: false
// Token: String, Value: 10.312, Line: 6, Column: 12, Quoted: false
// Token: EndObject

The actual parsing logic in the parser would be very small with a large number of extension methods. Core parsing methods would return nullable or null instead of throwing exceptions, as exceptions are not flexible and are expensive. Let the client decide what they want to do. Parsing an object would go as follows:


// Prior content [...]
while (reader.Read())
{
    if (reader.Token == PropertyName)
    {
        switch (reader.Value)
        {
            case "player": Player = reader.ExpectString(); break;
            case "date": Date = reader.ExpectQuotedDate(); break;
            case "rivals": Rivals = reader.ExpectQuotedStringList(); break;
            // [...]
        }
    }
}

// The client can write some pretty cool extension methods
// (Or these can just be included)
public static string ExpectString(this ParadoxSaxReader reader)
{
    var value = reader.Value;
    var result = reader.ReadString();
    if (result == null)
        // A simple error message... can be more complex with line number, etc
        throw new ApplicationException("Expected string after " + value);
    return result;
}

public static DateTime ExpectQuotedDate(this ParadoxSaxReader reader)
{
    var value = reader.Value;
    DateTime? result = reader.ReadDate();
    if (result == null || !reader.Quoted)
        throw new ApplicationException("Expected quoted date after " + value);
    return result.Value;
}

public static IList<string> ExpectQuotedStringList(this ParadoxSaxReader reader)
{
    var value = reader.Value;
    if (!reader.Read() || reader.Token != StartObject)
        throw new ApplicationException(/* */);

    IList<string> result = new List<string>();
    while (reader.Read() && reader.Token != EndObject)
        result.Add(reader.ExpectQuotedString());

    if (reader.Token != EndObject)
        throw new ApplicationException(/* */);
    return result;
}

public static string ExpectQuotedString(this ParadoxSaxReader reader)
{
    var value = reader.Value;
    var result = reader.ExpectString();
    if (!reader.Quoted)
        throw new ApplicationException(/* */);
    return result;
}

Just some thoughts I was having. I don't think it matters if one parser is faster than the other, they both have distinct benefits.