pokerregion / poker

Poker framework for Python
https://poker.readthedocs.org
MIT License
339 stars 130 forks source link

Parse multiple hand histories from one file #18

Open kissgyorgy opened 9 years ago

kissgyorgy commented 9 years ago

Right now it can only parse 1 hand history / file. Make a function that can parse all the hand histories in one file.

martinitus commented 8 years ago

As with #21 I would recomment you split the HandHistory apart from different Parsers. I did parse PS handhistories with c++ and followed the line by line approach, this worked very well and was easy to extend for different type of games or handhistories without a hero. Maybe ill fork and do some of the stuff, since im really interested in a parser written in python.

kissgyorgy commented 8 years ago

Sure, go ahead! I interested in your idea.

martinitus commented 8 years ago

Hey, I've had a second closer look at your code/docs. I managed to dirty hack into the parser so it can also read cashgames. However, my previous comment won't work (without some ugly workarounds) if one wants to keep the opportunity to only partially parse a handhistory. Why: If the HandHistory class does not keep functionality of how to parse the text, then one cannot create a parser and let the parser return handhistories. Since the returned handhistories have to be fully parsed by the parser bevore they get returned. The workaround would be that the parser plugs in the function into the handhistory, but then the whole idea is obsolete again.

So here comes a question and an other idea: The Handhistory is just data, and doesnt know about parsing. The Parser has the functionality to "peek" into a handhistory and only parse the header, and (if user decides that he is interested) keep reading the remaing part. If you think of the parser as some sort of generator for handhistory objects, that could be as follows:

   parser =  PokerStarsParser("/path/to/hh/file.txt")
   for header in parser.headers():
      if header.tablesize = 9:
         hh = parser.parse() # read the full handhistory
         do_other_stuff(hh)
      else:
         parser.skip() # seek to the next handhistory in the file

That would allow quick filtering of histories during reading, while keeping the parsing logic separated from the game/data logic. If filtering is not required it would boil down to something like:

   for hh in parser.all():
         do_other_stuff(hh)
martinitus commented 8 years ago

And bytheway: This approach would make the HandHistory Interface redundant. While it would introduce a parserinterface (which will be much smaller).

kissgyorgy commented 8 years ago

So here comes a question and an other idea: The Handhistory is just data, and doesnt know about parsing. The Parser has the functionality to "peek" into a handhistory and only parse the header, and (if user decides that he is interested) keep reading the remaing part.

I meant something like that in #10: I want to totally separate the cases when you only need to recognize or minimal information from a hand history, but not interested in very structured/detailed data. Your idea makes this even more better just peaking as little as possible into it and parsing later if needed or doing in a different class (eg. RawHandHistory vs HandHistory). This unified class would have the widest feature set (I mean if an attribute is somewhere it would be here, but all the attributes would default None) I already started working on this.

I also very much like your idea about a unified HandHistory interface, but at the first glance, I would not throw away a separate class, like Parser for every room, but those would generate HandHistory classes. You can decide the parsing detail in the parser constructor, something like this:

parser =  PokerStarsParser("/path/to/hh/file.txt", detail=FULL)
hand_histories = parser.parse_all()

Detail param could be e.g. FULL, HEADER_ONLY, or None which would mean don't parse at all, just give back the raw history. And then you could make a list and later you could parse other details like this:

for hh in hand_histories:
    hh.parse()
    # or hh.parse_ident() or hh.parse_hader()

This way you could store or pick hand histories very fast and still be able to fully parse them later.

Also, I would like to have a Parser which can separate hand histories by poker rooms in the same file.

martinitus commented 8 years ago
for hh in hand_histories:
    hh.parse()
    # or hh.parse_ident() or hh.parse_hader()

If one follows this approach, then the HandHistory class has to know how its internal text is to be parsed. Hence data and parsing logic will stay intermixed. If done some basic stuff in my fork. It doesn't run yet, but maybe you can get the idea from it.

kissgyorgy commented 8 years ago

@aolsux I thought about this and I really like your idea, it would simplify things a lot. I also looked at your code, and like it. Also I really want to have the "parse later" feature, when someone only want to just peek into the hands (like save the ID and the raw hands) and parse later, so here is my idea for this use case:
attach the parser (self._parser) and the raw hand history (self.raw) both optionally of course, and define a method on hand history class, something like this:

class HandHistory(object):
    raw = None
    id = None
    date = None
    players = []
    _parser = None
    # ...

    def parse(self):
        if not self._parser:
            raise AttributeError('Parser is not attached, you have to call parse() '
                                 'with the attach_parser=True parameter.')
        hh = self._parser.get_first(self._raw)
        self.__dict__ = hh.__dict__

This way, the parser is not coupled to the handhistory, but it's still possible to parse "in place". I'm not sure about the copy __dict__ idea yet :smile:

kissgyorgy commented 8 years ago

Please make a pull request and we can discuss the implementation.

martinitus commented 8 years ago

Hi, I'll see what I can do the next days. So long merry Christmas:-P Am 23.12.2015 15:00 schrieb Kiss György notifications@github.com:Please make a pull request and we can discuss the implementation.

—Reply to this email directly or view it on GitHub.

jfiedler5 commented 4 years ago

Hi, has this progressed? With an input of a hh file containing multiple hands, I would be very grateful for the ability to write data of hands that fulfil some criteria (ie matching hand ID from a list of hand IDs - another input) into a .csv file. I will maybe start a new "issue" describing this. :)

r-nikhil commented 4 years ago

Also waiting on this. Also the current implementation doesn't seem to work with these kind of files HH20200507_-ADA185625842-_100-200USD-_Pot_Limit_Omaha (1).txt

jfiedler5 commented 4 years ago

You might find helpful my lightweight solution making changes to the parse() method. After calling parse(), now it goes through each hand and writes the variables you want to a dataframe.

    def parse(self):
        cols = ['ident', 'button player', 'show down']
        data = []
        delimiter = "\n\n\n"
        paragraphs = self.raw.split(delimiter)

        for paragraph in paragraphs:
            self = PokerStarsHandHistory(paragraph)

            self.parse_header()
            self._parse_table()
            self._parse_players()
            self._parse_button()
            self._parse_hero()
            self._parse_preflop()
            self._parse_flop()
            self._parse_street("turn")
            self._parse_street("river")
            self._parse_showdown()
            self._parse_pot()
            self._parse_board()
            self._parse_winners()

            self._del_split_vars()
            self.parsed = True

            data.append([self.ident, self.button.name, self.show_down])
            df = pd.DataFrame(data, columns=cols)
        print(df)