mrakgr / The-Spiral-Language

Functional language with intensional polymorphism and first-class staging.
Mozilla Public License 2.0
919 stars 27 forks source link

Thoughts on this? #16

Closed thedavematthews closed 5 years ago

thedavematthews commented 5 years ago

No limit: AI poker bot is first to beat professionals at multiplayer game https://news.ycombinator.com/item?id=20414905

mrakgr commented 5 years ago

Sorry, I missed this issue 4 days ago. I only saw it just now. Usually, I reply promptly.

I saw this a few days ago on the ML sub, and skimmed the article. My first impression based on a high level description of the algorithm is that it is a branch of CFR. This is hardly strange as this is the specialty of Sandholm and his group. A few months ago I looked into CFR so I am a bit familiar with it. I had a choice at that time whether to chase after that after throwing in the towel on deep RL or whether to go down the formal verification route and I went down the later.

There is a bunch of things of what I want to say, but for me to seriously attempt an attack on the online gambling dens, 3 requirements have to be met.

From what I know, CFR algorithms by their very nature are tabular. Though you can plug a deep net into them, on their own they say nothing interesting about state abstraction. I am really interested in that. Without dealing with state abstraction, it is impossible to scale.

Even before this 6-max breakthrough in NL Holdem, they put out a paper where they managed to simplify Libratus and make it efficient enough so that it can be trained on a laptop. I found that impressive. But still there is a difference between turtling it up around the Nash equilibrium while the pros break their fangs on you and actively seeking out prey. Real people will not just stand around while you beat on them and there is the rake to consider. That 5bb/100 6-max win rate while absolutely crushing at higher stakes, would get eaten by the rake at lower ones. I need something that can beat the lower stakes before I can move up and to do that hunting is necessary.

An algorithm that hunts is the kind of algorithm that has the capacity to anticipate. I could query it for an estimate of EV and hand ranges of its opponent at any time in play. None of the CFR algorithms that I've seen have this capacity. I could also for example download a big database of hands played by others and feed it to it in order to fine tune it. CFR does not work like that. Like the Starcraft and the Dota efforts by Deepmind and OpenAI, this advance is just another niche thing hyped well beyond its capacity to generalize.

My last experience with ML before I started studying Software Foundations was playing around with FSI-CFR and realizing that I had absolutely no ability to reason through what effect changing some of its dubious pieces would have. I could implement it and test it out that way, but to me implementation and testing at runtime is not a primary activity. When I write a program I am already 95% sure of what I am doing and running the program is just to make sure I did not miss any details. This is a vast difference from current day ML where I have no insight at all most of the time and would be forced to just throw crap at the wall and see what sticks. Since things are like this I have a choice. I could spend decades just farting around or I could get some real skills, overcome my math deficiencies and then do this whole thing properly.

Right now I am still going through the basics, but at some point I will be through with that and CFR algorithms will be the focus of my formalization efforts amongst other things. They definitely have interesting properties compared to standard RL algorithms, and elucidating those differences seems like a necessary challenge.

You have to understand that 2018 when I finally made the switch from working on Spiral to using Spiral to work on ML was not at all like I thought it would be. Usually when I sit down and program, the insights just come to me, but in ML they do not. So they will have to be forced out. I need to go through some extreme training to get the ability to do that as my math talent is so mediocre. Given how bad I am at it, the only way I will be able to get anywhere is if I make math into programming and have the computer make up for my deficiencies in that area.

It is really a pity, but as I have nothing worth implementing Spiral right now is useless to me and its repo for v0.1 since March has just served as a convenient work diary. It will remain that way for the foreseeable future and Spiral development has been on hiatus for a few months now.

Maybe I should have just started with the ML experiments by doing them in Python rather than work on a language first, but then I would have missed out on mastering the staged functional programming style. I still think, or hope that it will be useful to me someday. If I myself invented something like Hinton's capsules, I'd have no trouble efficiently implementing them.

mrakgr commented 5 years ago

Let me add just a bit more, since now I've actually gone through the paper and the supplement. The Monte Carlo CFR on the page 17 of the supplement is quite similar to regular CFR which I've implemented before so it would not be too hard to recreate, but the paper does a lot of fairly hacky engineering to fit what is essentially a tabular algorithm on a game as huge as 6-max NL Holdem. I do not think I would be able to do those parts nearly as easily and I suspect that a lot of those choices are arbitrary. My instinct is to find a way to abstract those parts.

So this is about the standard for ML in the year 2019 - high engineering and complexity, high hype and low theory and understanding. There will be an AMA with the authors tomorrow and I am almost tempted to ask whether there are any formal proofs demonstrating the superiority of CFR to standard RL. I am not being entirely sarcastic here; I'll have to do a literature review to figure that out at some point.

That having said, tabular algorithms are not a bad pick compared to NNs right now.

On a toy poker game I once trained a RNN with KFAC. When I increased the sequence length feed to the agent from hand-to-hand to game-to-game, the performance of the RNN dropped off a cliff, but tabular RL actually got better. I found this quite impressive. I tried various tricks to make the RNN work better, but I quickly reached my limits. This impressed upon me just how unreliable deep RL really is.

So all in all, given the present state of affairs regular jobs still have appeal compared to banditry. I am just really unfortunate that science needs to be developed before it can be misused. But Spiral is done and the algorithms are not here, so what choice do I have but to do it?

I'll close this thread here, but feel free to reopen it if you have any more questions.