posita / dyce

Simple Python tools for exploring dice outcomes and other finite discrete probabilities
https://posita.github.io/dyce/
Other
33 stars 2 forks source link

Add lea package to the comparison table? #10

Open bszonye opened 1 year ago

bszonye commented 1 year ago

I'm currently working on a project^1 for analyzing games at a higher level (e.g. expected outcomes for Warhammer attack sequences), for which I have been using Pierre Denis's lea package ^2 for the low-level probability analysis. I ran across your work[^3] and am now evaluating whether it's better suited to my use case. In any event, I thought you might also be interested in evaluating lea as it seems to be an excellent low-level library for discrete probability analysis, although it lacks some amenities like type hints.

This isn't so much an "issue" as "hello, I'm working in a related space, here's what I've found so far." Cheers!

[^3]: ... in the big mypy numeric tower discussion. Really appreciate your input there! I've run across many of the same stumbling blocks in figuring out how to represent things like numeric values, random values, dice modifiers, probability mass functions, and so on.

bszonye commented 1 year ago

Oh hey, I just saw the link to your discussions area in the dyce docs. I suppose that would have been a better place to say hello. Although I suppose this gives you a place to track investigation of the lea library if it does look useful to you. It's pretty nifty and can do some sophisticated analysis like Bayesian inferences.

posita commented 1 year ago

Hey @bszonye, and thanks for getting in touch! (And my sincerest apologies for taking so long to respond. I haven't been very good about that these past few weeks.) Thanks so much for putting lea on my radar. (I hadn't yet come across that one yet.) It certainly covers more ground than dyce does. I'm looking forward to digging in further. Thanks for the pointer!

If it's at all comparably performant, I might substitute implementation details of dyce with that implementation. One thing I've struggled with is how to avoid accumulating floating point errors (which should always be possible with discrete probabilities), but remain fast (i.e., avoiding Fractions for internal representations). lua may have some tricks there.

If lua's interfaces are relatively intuitive, who knows? It might lead to retiring dyce altogether and focusing on visualization tooling. I don't know yet, though. lua's interfaces seem to reflect the jargon of mathematical probability, whereas I've tried (although not always succeeded) to have at least dyce's basic interfaces be more consumable by (e.g.) RPG GMs (who often think more in terms of dice rolls, accumulation, and branches).

I'm happy to collaborate or try to provide guidance (or reveal painful lessons), so long as its helpful. I feel your pain regarding typing issues, which we can further explore in posita/numerary#18, if that works?

Question(s) for you: Who's your intended audience with bones? Do you have a vision for an end product (even if vague)? I'd love to know more!

bszonye commented 1 year ago

Hi, thanks for replying! No need to apologize, I know how these things go.

To answer your last question first: My intended audience is "gamers who can write markup and run Python," which is small but nonzero. 😸

My personal killer app is something that can analyze weapon damage for an entire Warhammer army. Currently I use web tools like AoS Statshammer that let you input a few weapon profiles to see tables of their expected damage. It's handy for answering questions like "should I arm this unit with swords or spears?" but tools like this don't work well for large-scale or deep analysis. Some of them impose explicit limits while others are just too hard to use at scale, and almost all of them rely on Monte Carlo simulation instead of exact math.

So I've really been wanting something that will let me easily input stats for an entire faction and show how everything performs in common situations. For my last army, I did this in a spreadsheet, which gave me some of the flexibility that I needed, but it wasn't very extensible or general.

I'm really impressed with lea in some respects. It's got excellent problem-solving abilities, and it's reasonably pythonic and well-documented. Still, I've struggled with some usability stuff. The docs are good, but they're very tutorial-oriented, and that's not how I learn best, plus I'm rusty at the academic jargon that you pointed out. I also tried a few things early on that just didn't work like I expected. Like, I tried to modify a PMF through map and ended up with an unusable object that seemed to have the right structure and data but didn't actually work. I think for gamers, it might work better as an internal engine than as the main probability interface.

The other negative with lea is that it doesn't have any typing stubs, and some of its data types are complex enough that calling them Any is not enough to fully satisfy mypy or pyright. I started writing my own stubs for the stuff I was using and started to feel like I was doing enough work to justify just rolling my own implementation from scratch. I wouldn't worry about the typing stuff except that 1. I'm learning it for professional reasons, and 2. I want to be able to provide bones as a library, not just a command-line tool, and so I want to provide typing hints for folks downstream of me.

There are some things that I really liked about lea, though. It has no dependencies of its own, but it'll use tools like matplotlib if they're available. You can configure it to do float math or exact. The author seems to have dealt with performance issues by developing better algorithms. Fractions are fast enough for small problems, so the key is usually figuring out how to keep your NdK dice problem from becoming O(K^N).

I have some experience with that myself. One of my first serious bits of probability software was figuring out how to solve Nd6-keep-highest on a Commodore 64 without running out of memory or time. Luckily there's an O(N) solution that works by accumulating results one die at a time. I think most dice problems are amenable to that kind of analysis (although Nd6-keep-middle is surprisingly hard).

I really like what you've done with dyce as well. Some of your choices were inspiring, like how your core histogram type represents probabilities as integers and only applies the denominator as necessary. I think that's a good idea, because for dice analysis it's helpful to model things with integers as much as possible. They're exact, they're fast even if you need bignums, and Python actually has good bignum support.

There were some blockers that kept me from adopting dyce as a probability engine, though. Warhammer damage is complex enough that I need to model it with a 3- or 4-dimensional value type (currently a dataclass), and I'm not sure whether that would succeed with H using RealLike as the underlying discrete value type. The indirect dependency on beartype was also a mild negative, as it's another unfamiliar typing tool that I'd need to evaluate before including it in the project. Those are both surmountable problems, I'm just currently at a point in my project where I want to put my effort towards feature implementation rather than tooling.

On the numeric tower thing, I've decided to punt for now. I'll share my experience with that in the numerary discussion!

posita commented 1 year ago

I really like what you've done with dyce as well. Some of your choices were inspiring, like how your core histogram type represents probabilities as integers and only applies the denominator as necessary. I think that's a good idea, because for dice analysis it's helpful to model things with integers as much as possible. They're exact, they're fast even if you need bignums, and Python actually has good bignum support.

There were some blockers that kept me from adopting dyce as a probability engine, though. Warhammer damage is complex enough that I need to model it with a 3- or 4-dimensional value type (currently a dataclass), and I'm not sure whether that would succeed with H using RealLike as the underlying discrete value type. The indirect dependency on beartype was also a mild negative, as it's another unfamiliar typing tool that I'd need to evaluate before including it in the project. Those are both surmountable problems, I'm just currently at a point in my project where I want to put my effort towards feature implementation rather than tooling.

Thanks for raising these! You've got my wheels a-turnin'. :thinking:

I will concede immediately that RealLike is an icky compromise that even gets in my way. For example, in my implementation of this response to a StackExchange post, I used tuples as outcomes (keys) in my H object. That, of course, necessitated # type: ignore comments during H construction and consumption (see, e.g., here, and here). It would have also blown up if I didn't use H objects in vary narrow ways (i.e., basically just as Counters). It's precisely these types of applications that took me down the typing rabbit hole. There's no easy way to define a generic container that supports the intersection of the operations its contained types support. In this regard, I think numbers are a red herring. It shouldn't matter what the contained type is. Numbers serve as a very useful test case because they tend to push the limits on being able to infer and type check things like __add__(MyContainerType[Union[int, Fraction]], Decimal) without having to explicitly define that relationship.

Your comments did inspire me to wonder whether there's a useful division between the accumulating container implementation and algorithms of things you can do with it. For example, (ignoring performance issues with Counter for the moment) is there a world where one could use something like dependency injection to establish "this Counter now supports __pow__(self, Union[int, Fraction], Container[Union[int, Fraction]]). I think that converges on similar problems (and possibly solutions) we're chatting about in posita/numerary#18. I'll have to stew on that some more.

The good(ish) news regarding beartype is that it's only a strict dependency for numerary, which extends beartype's caching protocol implementation. Both dyce and numerary use other beartype features internally, but those are off by default (unless the NUMERARY_BEARTYPE environment variable is set to a "truthy" value). So, yeah, it's a runtime dependency, but it shouldn't affect library users unless they explicitly opt into it. At least that's the intention. If you feel otherwise, please let me know! Also, if the relationship between dyce, numerary, and beartype is unclear, or if it's confusing how that relationship impacts library customers, please let me know. (I'm sure I could make it clearer.)

I have some experience with that myself. One of my first serious bits of probability software was figuring out how to solve Nd6-keep-highest on a Commodore 64 without running out of memory or time. Luckily there's an O(N) solution that works by accumulating results one die at a time. I think most dice problems are amenable to that kind of analysis (although Nd6-keep-middle is surprisingly hard).

As an aside, it might be fun to compare notes with Ilmari Karonen (see his dice_roll.py implementation; he's also pretty active on StackExchange) or @HighDiceRoller (and his icepool library which leverages Karonen's approach).

I'm sympathetic, though. There are all kinds of optimizations that one can use in certain contexts that don't really apply to general cases. It can be both maddening and rewarding to try to detect those contexts and apply relevant optimizations.

My intended audience is "gamers who can write markup and run Python," which is small but nonzero. :smile_cat:

My personal killer app is something that can analyze weapon damage for an entire Warhammer army. Currently I use web tools like AoS Statshammer that let you input a few weapon profiles to see tables of their expected damage. It's handy for answering questions like "should I arm this unit with swords or spears?" but tools like this don't work well for large-scale or deep analysis. Some of them impose explicit limits while others are just too hard to use at scale, and almost all of them rely on Monte Carlo simulation instead of exact math.

My guess is that your audience doesn't benefit from typing. That's not a dismissal of the importance of having typing that works, even if you're the only consumer of it as you build what you want.

As an aside, I join you in being frustrated that Python's types can't express the things I want. It's been remarkably hard to get core Python developers to acknowledge that typing is a useful part of an API that library authors can deliver to their customers. The subtlety there is that library customers either think of a library of providing typing information or not. They don't really think in terms like, "Oh, I see. The library provides useful typing information up to this point, but not past it, so I'll have to keep track of where I cross that line in my own code. And maybe I'll be forced to pay attention to where that happens because those limitations will express as errors that are expressions of limitations of the type system, not my code or the library code. And to know the difference, I'll probably need to disrupt my own problem solving to validate edge cases by perform experiments or write tests (or just add # type: ignore comments)." That's an unacceptably leaky abstraction from a library author's perspective.

But (back down off my soapbox), my prior observation (about your customers not needing typing directly) is that you may be in a position to absorb that pain yourself and hide it from your audience. Right now, I've kind of backed myself into a corner with dyce where I can't easily do that because I make the claim that my intended audience is other developers. (As another aside, I'm still wrestling with whether that claim is or should be true, and whether dyce really is useful to tool builders, or whether I should build a tool that more directly competes with AnyDice and not worry too much about dyce being usable as a library.)

Have you thought about what a tool looks like for your intended audience? Are you shooting for something like an augmented AoS Statshammer, or do you have a vision for an entirely different concept? Do you anticipate your audiences writing rudimentary Python or would you try to insulate them with a different (simpler?) markup?

bszonye commented 1 year ago

My most important audience is myself, of course!

For the warhammer module, I've designed things around a core Profile class that can read TOML from files or strings, and then uses the field type information from @dataclass to validate the input and hand things off to the right constructors. That should let me transcribe the unit stat blocks pretty directly into TOML, which is lot more readable and writable than YAML or JSON, without the effort that a DSL would require.

Many details are still up in the air, but I have two major use cases:

The first one helps with army building, and the second one is a useful strategic reference in play. Previously I've done this with a handcrafted spreadsheet, but that's super labor intensive and error-prone. I'm aiming to keep all of the unit data in TOML, with files for the general faction data, specific army loadouts, and typical attack targets. That should make it easy to design commands that e.g. get a list of attackers from one file, a list of targets from another file, and then generate the relevant damage matrix from the Cartesian product.

If it becomes popular, maybe I design some stuff aimed at people other than myself. 😹

HighDiceRoller commented 1 year ago

Thanks for the ping @posita ! A loose collection of comments:

Lea

I too learned about Lea recently. Unfortunately, my first trial with it ended when I couldn't find an easy way to create a deck with more than one copy of a card---it seems as soon as a card is drawn, all copies are effectively removed from the deck??? Perhaps I will give it another look later, but my impression is that the tabletop setting really does have certain patterns that come up disproportionately often compared to the broader mathematical field.

One of my first serious bits of probability software was figuring out how to solve Nd6-keep-highest on a Commodore 64 without running out of memory or time. Luckily there's an O(N) solution that works by accumulating results one die at a time. I think most dice problems are amenable to that kind of analysis (although Nd6-keep-middle is surprisingly hard).

Instead of iterating over dice, Ilmari Karonen had the idea of iterating over outcomes, e.g. how many dice rolled a 1, how many rolled a 2, and so forth. I developed this into a reasonably general-purpose algorithm that can efficiently keep/drop from arbitrary positions, handle mixed standard dice, find matching sets and straights, compute RISK, and more. If you would like to know more, you can read my paper on the subject.

Fractions are fast enough for small problems, so the key is usually figuring out how to keep your NdK dice problem from becoming O(K^N).

The intermediate approach is to only consider all sorted sets, weighted appropriately; this is how Troll and AnyDice work (well, we don't know for sure without the source code---though it is strongly implied by the API design and developer statements). However, this is still exponential if both the die size and the number of dice increase linearly; if one is held fixed, the running time is technically polynomial but can be of quite high order. For example, AnyDice can only rank ~10 ability scores before conking out, whereas Icepool can easily do twice that.

Warhammer damage is complex enough that I need to model it with a 3- or 4-dimensional value type

It's been a long time since I've played Warhammer; what sort of mechanics are you trying to cover? If each attack is rolled independently with some ultimate probability distribution of wounds caused, then the collective wounds dealt is just the sum distribution, which is easy to compute via repeated convolution. Both posita and I implement the + operator that does exactly this, and @ for compactly adding a number of identical dice together (similar to AnyDice's d operator, though there are a few subtle differences in semantics between the three). In fact, if each attack either causes 0 or 1 wound, then the result of multiple identical attacks is just a binomial distribution.

Or are you trying to simulate entire multi-turn combats?

bszonye commented 1 year ago

Fractions are fast enough for small problems, so the key is usually figuring out how to keep your NdK dice problem from becoming O(K^N).

The intermediate approach is to only consider all sorted sets, weighted appropriately; this is how Troll and AnyDice work (well, we don't know for sure without the source code---though it is strongly implied by the API design and developer statements). However, this is still exponential if both the die size and the number of dice increase linearly; if one is held fixed, the running time is technically polynomial but can be of quite high order. For example, AnyDice can only rank ~10 ability scores before conking out, whereas Icepool can easily do twice that.

Impressive! I'll definitely check out your paper after I'm done replying here.

I just implemented this the other day in bones: https://github.com/bszonye/bones/blob/bradd/warhammer/bones/pmf.py

Here's my general technique: There are math.comb(n+k+1, k) possible ways to roll NdK, and you can enumerate them quickly with itertools.combinations_with_replacement(range(1, k+1), n). For each combination, you can determine its probability mass using the multiset permutation function. Chaining the two operations:

    @classmethod
    @functools.cache
    def roll_NdK(cls, dice: int = 1, *, sides: DieRange = 6) -> Self:
        """Create the PMF for rolling a pool of N dice with K sides."""
        return cls((p, pmf_weight(p)) for p in enumerate_NdK(dice=dice, sides=sides))

AnyDice can enumerate about 42 dice within its 5-second limit (e.g. 3@42d6). My method is in the same neighborhood. It enumerates 40d6 in 4.9 seconds user time on my Intel 12900K CPU, and 42d6 takes 6.6 seconds. I could probably beat AnyDice with some tuning, although I'm nowhere near icepool. I'm pretty happy with the results though, because it was frankly quite simple to implement in Python once I had the math & algorithm worked out.

It's possible to do much better if you only need to select/drop dice from one end of the pool, instead of selecting arbitrary dice from the middle. For example, to analyze NdKhM, you just need to enumerate MdK and then iteratively adjust the results for +1dK for the remaining (N-M) dice. I think that's roughly O(KN), and bones can resolve thousands of of dice in under a second. In my experience, most dice analysis is amenable to that kind of analysis.

I'm not too worried about the worst-case performance, because I think it's avoidable unless you need to e.g. select a median die out of 30+ dice, and that's just not a thing that games have you do. It's not a good game mechanic. 😹

Warhammer damage is complex enough that I need to model it with a 3- or 4-dimensional value type

It's been a long time since I've played Warhammer; what sort of mechanics are you trying to cover? If each attack is rolled independently with some ultimate probability distribution of wounds caused, then the collective wounds dealt is just the sum distribution, which is easy to compute via repeated convolution.... Or are you trying to simulate entire multi-turn combats?

Warhammer is easy to model for the basic attack sequence where you roll N dice and filter them through a series of target number rolls (hit roll, wound roll, save roll, ward roll). The complication is that there are two types of damage, wounds and mortal wounds, and they target different defenses. Plus there are several ways to add extra attacks mid-sequence or to skip parts of the sequence. Some common examples:

Mortal wounds ignore save rolls, but most wards still stop them. Meanwhile, most wards stop both kinds of wounds, but some only affect mortal wounds. I also need to deal with most of the rules having various linear modifiers, modifier caps, and rerolls that affect specific steps of the attack sequence. I'm finding it easiest to track the flow of everything if I track the accumulated number of attacks, wounds, and mortal wounds coming into each step and then going out. (It does eventually just end up being "damage" by the end of the ward roll, but I'd like to be able to see the expected results at each step rather than treating the whole sequence as a black box. Plus it's just difficult to juggle all of the rules complexity without breaking out the different steps and result types.)

WH40K additionally complicates things by having a blow-through rule, where a single attack die can deal multiple wounds, but it cannot kill more than one enemy. I'm not sure how best to implement that. It might mean that I can't treat attacks as a fungible counter and instead will need to iterate over them, but that complicates "explode on 6" type rules.

I too learned about Lea recently. Unfortunately, my first trial with it ended when I couldn't find an easy way to create a deck with more than one copy of a card---it seems as soon as a card is drawn, all copies are effectively removed from the deck???

Interesting that we both stumbled early on quirks of the Lea library. In addition to the trouble I had with map, I also had some trouble at first with mixed line endings in the source code. (Some code got checked in with mixed Linux & Windows line endings, which probably works fine if you check out on Windows, but it caused chaos in my dev environment.) Fortunately, the owner was friendly and readily accepted my PR to fix the problem. I'd tag him in too, but Lea's on bitbucket, not github.

Speaking of which, we've wandered a bit afield from the issue topic. I don't mind moving to a different venue if there's a more appropriate one.

bszonye commented 1 year ago

@HighDiceRoller After reading your paper, I realized that I'm essentially making a state machine to deal with the complexities of the Warhammer attack sequence, and my AttackCounter object is the state data.

Nice paper. I'm still digesting the concepts, but that all totally makes sense. It's a generalization of the way that I already think about things like the keep-highest algorithm and the Warhammer attack sequence. I also like the insights around alternate ways to represent the dice pool multiset. The sequence-of-sequences representation works well for full enumerations, but something more like Counter probably makes more sense for other applications. I have lots of ideas to play around with now. Thanks!