iigorr / pgn.net

Portable Game Notation (PGN) implementation in .NET
Other
33 stars 24 forks source link

Notification while reading files #16

Open mwo-dk opened 9 years ago

mwo-dk commented 9 years ago

Hi,

Thx again. Great library. As you know I like to read and do "stuff" with large files. I would like to read and optionally have progress. But what would be really nice is if the file-reader could notify ie. via an event about parsing individual games as they complete.

I tried stuff with that million-game (about 2.2M games) db, I mentioned in the large-file request. Did soimething in the line of:

[Serializable] public struct PGNGame { public int White { get; set; } public int Black { get; set; } public GameResult Result { get; set; } } class Program { static void Main(string[] args) { int gamesRead = 0; int maxPlayerId = 0; int maxGameId = 0; ConcurrentDictionary<int, string> playerBase = new ConcurrentDictionary<int, string>(); ConcurrentDictionary<int, PGNGame> gameBase = new ConcurrentDictionary<int, PGNGame>();

        var reader = new PgnReader();

        var start = DateTime.Now;

        var parsedGames = new BlockingCollection<Game>();
        var queue = new BlockingCollection<List<string>>();
        Task.Run(() =>
        {
            foreach (var gameData in queue.GetConsumingEnumerable())
            {
                var data = gameData.Aggregate((x, y) => x + y);
                var games = reader.ReadFromString(data);
                foreach (var game in games.Games)
                    parsedGames.Add(game);
                if (parsedGames.Count > 100000)
                    Thread.Sleep(500);
            }
        });
        Task.Run(() =>
        {
            foreach (var parsedGame in parsedGames.GetConsumingEnumerable())
            {
                Task.Run(() =>
                {
                    var white = parsedGame.WhitePlayer;
                    var black = parsedGame.BlackPlayer;

                    if (!playerBase.Values.Any(name => name == white))
                    {
                        playerBase[Interlocked.Increment(ref maxPlayerId)] = white;
                    }
                    if (!playerBase.Values.Any(name => name == black))
                    {
                        playerBase[Interlocked.Increment(ref maxPlayerId)] = black;
                    }

                    var whiteId = playerBase.First(kvp => kvp.Value == white).Key;
                    var blackId = playerBase.First(kvp => kvp.Value == black).Key;
                    gameBase[Interlocked.Increment(ref maxGameId)] = new PGNGame
                    {
                        White = whiteId,
                        Black = blackId,
                        Result = parsedGame.Result
                    };
                    if (maxGameId % 100 == 0)
                    {
                        var now = DateTime.Now;
                        var secs = (now - start).TotalSeconds;
                        var speed = maxGameId / secs;
                        var estimated = 2200000 / speed;
                        Console.WriteLine("Games read: {0}", gamesRead);
                        Console.WriteLine("Queue length: {0}/{1}", queue.Count, parsedGames.Count);
                        Console.WriteLine("#Players: {0}. #Games: {1}. Speed: {2}. Estimated duration: {3}.",
                            maxPlayerId, maxGameId, speed, estimated);
                    }
                });

            }
        });
        using (var streamReader = new StreamReader("millionbase-2.22.pgn"))
        {
            var pgn = new List<string>();
            while (!streamReader.EndOfStream)
            {
                var line = streamReader.ReadLine().Trim();
                var isNewGame = line.StartsWith("[Event");
                if (isNewGame && pgn.Any())
                {
                    queue.Add(pgn);
                    Interlocked.Increment(ref gamesRead);
                    pgn = new List<string>();
                    if (queue.Count > 10000)
                        Thread.Sleep(500);
                }
                pgn.Add(line);
            }
        }
    }
}

And am able to keep speeds of 300-450 games per sec. As you can see I use some producer-consumer scheme and "throttle" in order to keep the blocking queues managable. But I cheat... The line:

var isNewGame = line.StartsWith("[Event");

is because I know, that for that pgn-file every game is decently formatted. And as we all know, that is not the case with the rather loose PGN input format.

Is this a feasible thing to get?

Very much appreciated anyways,

Thanks for the good work, Michael