taw / magic-search-engine

Search engine for Magic cards
MIT License
47 stars 19 forks source link

Implement loading packs from YAML #164

Closed taw closed 1 year ago

taw commented 1 year ago

Moving pack data from code to yaml (or similar) would let other people contribute more easily.

I started it with:

There's index of supported packs at:

The plan is roughly:

Also:

If some yamls are too complicated (for example foils), I might just add code-based support for them, like standard_foil: 1.

YAML live here for now: https://github.com/taw/magic-search-engine/tree/master/data/boosters . I might move them to another repository once this is done.

axxroytovu commented 1 year ago

For contributing newer sets with multiple booster types, would you rather all the files stay in the same parent folder (e.g. \one_draft.yaml) or in nested child folders (e.g. \one\draft.yaml)?

taw commented 1 year ago

I don't want to have hundred of folders with one file each.

I'm leaning towards one folder with names like one-arena.yml, with untagged one.yml being draft. That's how URLs are setup right now, and it seems to work fine.

If that's too much, we could also do draft/one.yml and only have a few folders. (then again, it won't work with SIR)

Anyway, it's a very easy thing to change if it turns out to be too much.

taw commented 1 year ago

This is going better than expected, it's 20% of booster types already migrated. Obviously starting from simplest types.

taw commented 1 year ago

40% done and validated that the result is identical in both yaml and code versions.

YAMLs feel a bit too repetitive, maybe usual sheets (basic, common, uncommon, rare_mythic, foil) should be predefined. This would cut the source in half.

I'm also wondering about more concise way to say this:

packs:
- basic: 1
  common: 10
  uncommon: 3
  rare_mythic: 1
  chance: 31
- basic: 1
  common: 9
  uncommon: 3
  rare_mythic: 1
  foil: 1
  chance: 9

Maybe something like:

pack:
- basic: 1
  common: 9
  uncommon: 3
  rare_mythic: 1
  {common: 31, foil: 9}: 1

YAML doesn't have a great way to express this.

For simple foils it's no big deal, but writing YAML for old style foils (example) would be hell.

I'm also pretty sure that various numbers will need to support mathematical expressions (or math would go into comments). Example, example.

I can probably get to 60% within current system.

axxroytovu commented 1 year ago

What if instead it looked like this:

pack:
  basic: 
    count: 1
    sheet: basic
  common: 
    count: 9
    sheet: common
  uncommon:
    count: 3
    sheet: uncommon
  rare_mythic:
    count: 3
    sheet: rare_mythic
  foil:
    count: 1
    sheets:
      common: 31
      foil: 9

This also makes masterworks easier, and set boosters can have slots like this:

pack:
  wildcard:
    count: 2
    sheets:
      wc_common: 28
      wc_uncommon: 7
      wc_rare_mythic: 5
taw commented 1 year ago

I'm not sure what's the best way to do packs section yet. The wildcard example can probably already be done just fine with sheet / any.

Anyway, I fixed endless repetition in sheets section. There's now common sheet list that's preloaded:

rare_mythic:
  any:
  - rate: 2
    query: "r:r"
  - rate: 1
    query: "r:m"
rare:
  query: "r:r"
uncommon:
  query: "r:u"
common:
  balanced: true
  query: "r:c"
basic:
  query: "r:b"
foil:
  foil: true
  any:
    - chance: 5
      query: "r:u"
    - chance: 12
      query: "r<=c"
    - chance: 3
      any:
      - rate: 2
        query: "r:r"
      - rate: 1
        query: "r:m"

They can be overridden, or ignored. Here's WWK example:

packs:
- basic: 1
  common: 10
  uncommon: 3
  rare_mythic: 1
  chance: 31
- basic: 1
  common: 9
  uncommon: 3
  rare_mythic: 1
  foil: 1
  chance: 9
sheets:
  basic:
    # borrowed basics
    rawquery: "e:zen r:b"

It cut total yamls line count by ~60% and now it's a lot clearer what's actually different.

axxroytovu commented 1 year ago

I like that. Once we get is:boosterfun and has:boosterfun working, we can load those into the default sheets so we don't need to deal with that for every pack.

taw commented 1 year ago

100/180 packs reached and verified, but I don't think this approach will get us all the way to the finish line. There's a lot of complexity in the final 80, so some additional kinds of yaml annotations will be needed for it.

For now I'll continue writing pack yamls, there's a few more that are still fine.

taw commented 1 year ago

Some ideas for alternative pack syntax.

Current:

packs:
- basic: 1
  common: 10
  uncommon: 3
  rare_mythic: 1
  chance: 31
- basic: 1
  common: 9
  uncommon: 3
  rare_mythic: 1
  foil: 1
  chance: 9

Maybe:

pack:
- basic: 1
- common: 9
- uncommon: 3
- rare_mythic: 1
- any:
  - common: 1
    chance: 31
  - foil: 1
    chance: 9

But combining that with packs syntax creates double nested array, and that looks awkward in YAML.

Doing it with a Hash doesn't quite work if we have more than one any:

pack:
  basic: 1
  common: 9
  uncommon: 3
  rare_mythic: 1
  any:
  - common: 1
    chance: 31
  - foil: 1
    chance: 9
axxroytovu commented 1 year ago

I had a go at a solution over here: https://github.com/taw/magic-search-engine/pull/165

It's a mix of that last option with the syntax I used earlier. It checks if the hash contents are an integer, and treats it as a static value. Otherwise it multiplies out all of the possible packs based on the chances given in the slot\sheets parameters.

pack:
  basic: 1
  common: 9
  uncommon: 3
  rare_mythic: 1
  foil_slot:
    count: 1
    sheets:
      foil: 9
      common: 31
taw commented 1 year ago

Looks like we already support multiple print sheets per card like https://mtg.wtf/card/lea/294/Forest (C5 U3)

taw commented 1 year ago

Up to 143/180 verified, and I think the syntax is good enough. The last 37 are the hardest ones.

After that's done it's up for documentation, and integrating this with indexer so it's all precalculated.

axxroytovu commented 1 year ago

Just want to verify that I've got the formatting right for mixed packs before I dive too hard into helping with those last 37: one.yaml.txt

taw commented 1 year ago

It crashes to wrong counts, but otherwise it parses and returns some results.

taw commented 1 year ago

156/180 done. The last 24 feature BBD and DBL with huge number of cases, the other 22 is only somewhat bad.

taw commented 1 year ago

It's possible to mix all annotation styles, like this one for MID:

packs:
- basic: 1
  common_or_foil:
  - sfc_common: 1
    chance: 2
  - foil: 1
    chance: 1
  sfc_common: 8
  dfc_common: 1
  sfc_uncommon: 2
  dfc_uncommon: 1
  sfc_rare_mythic: 1
  chance: 5
- basic: 1
  common_or_foil:
  - sfc_common: 1
    chance: 2
  - foil: 1
    chance: 1
  sfc_common: 8
  dfc_common: 1
  sfc_uncommon: 3
  dfc_rare_mythic: 1
  chance: 1
taw commented 1 year ago

So about the whole YAML idea. I'm doing a lot of mistakes where I add extra - or miss necessary - and get really stupid error messages when that happens. I guess everyone who tries to contribute is going to run into the same issue.

taw commented 1 year ago

All 180 done, exactly replicating code version (so no new showcases, some sheet names a bit weird for technical reasons etc.). It's 4176 lines, with a lot of duplication.

taw commented 1 year ago

Another idea.

Now:

sheets:
  dedicated_foil:
    foil: true
    any:
      - query: "r<=c"
        chance: 10
      - query: "r:u"
        chance: 3
      - any:
        - rate: 2
          query: "r:r"
        - rate: 1
          query: "r:m"
        chance: 1

What if:

sheets:
  dedicated_foil:
    foil: true
    any:
      - use: common
        chance: 10
      - use: uncommon
        chance: 3
      - use: rare_mythic
        chance: 1

This would be major readability win. Foil sections are generally really hard to read.

Even better, CN2 now:

  conspiracy:
    any:
    - query: "t:conspiracy r:c"
      count: 5
      rate: 8
    - query: "t:conspiracy r:u"
      count: 2
      rate: 4
    - query: "t:conspiracy r:r"
      count: 3
      rate: 2
    - query: "t:conspiracy r:m"
      count: 2
      rate: 1
  conspiracy_foil:
    foil: true
    any:
    - query: "t:conspiracy r:c"
      count: 5
      rate: 8
    - query: "t:conspiracy r:u"
      count: 2
      rate: 4
    - query: "t:conspiracy r:r"
      count: 3
      rate: 2
    - query: "t:conspiracy r:m"
      count: 2
      rate: 1

What if:

  conspiracy:
    any:
    - query: "t:conspiracy r:c"
      count: 5
      rate: 8
    - query: "t:conspiracy r:u"
      count: 2
      rate: 4
    - query: "t:conspiracy r:r"
      count: 3
      rate: 2
    - query: "t:conspiracy r:m"
      count: 2
      rate: 1
  conspiracy_foil:
    foil: true
    use: conspiracy
axxroytovu commented 1 year ago

I like the alternate way of doing foils, it's much cleaner. It'll definitely clean up foils once we start adding booster fun and things like retro artifacts or MOM legends retold.

taw commented 1 year ago

I put some README in https://github.com/taw/magic-search-engine/tree/master/data/boosters

The process is currently quite complicated, but most steps are optional and apply to only a few sets.

I want to enhance booster_indexer to support things like use:conspiracy, chance: 15*7 etc., this will save considerable amount of duplication.

taw commented 1 year ago

Added support for math expressions for chance/rate/count, so now I can do this:

  alara_premium_common:
    foil: true
    rawquery: "b:ala r:c"
    count: 101 + 60 + 60

And this:

  common_maybe_foil:
  - common: 10
    chance: 70 - 10
  - common: 9
    foil_common: 1
    chance: 10

Instead of doing error prone numbers manually.

taw commented 1 year ago

Use references are supported now. Reference can be to common sheet as well.

  conspiracy:
    any:
    - query: "t:conspiracy r:c"
      count: 5
      rate: 8
    - query: "t:conspiracy r:u"
      count: 2
      rate: 4
    - query: "t:conspiracy r:r"
      count: 3
      rate: 2
    - query: "t:conspiracy r:m"
      count: 2
      rate: 1
  conspiracy_foil:
    foil: true
    use: conspiracy
  dedicated_foil:
    foil: true
    any:
      - query: "r<=c"
        chance: 10
      - query: "r:u"
        chance: 3
      - use: rare_mythic
        chance: 1

A possible extension would be to allow references to other boosters, as there's duplication between X and X-arena, but I'll skip it for now.

taw commented 1 year ago

New feature. Filtering moved to indexer. Now by default:

The whole is:booster system made sense pre-boosterfun. Unfortunately it doesn't work going forward.

axxroytovu commented 1 year ago

How are you executing this? Whatever you did broke my ability to run the booster_indexer so I can't test my yaml files. I was previously able to run ruby ./booster_indexer/bin/booster_indexer just fine but now it breaks with:

/Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:113:in `process_sheet': undefined method `except' for {"code"=>"R"}:Hash (NoMethodError)
    from /Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:129:in `block in process_sheets'
    from /Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:128:in `transform_values'
    from /Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:128:in `process_sheets'
    from /Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:151:in `call'
    from /Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:186:in `block in process_data'
    from /Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:185:in `each'
    from /Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:185:in `process_data'
    from /Users/samzimmerman/Source/magic-search-engine/booster_indexer/lib/booster_indexer.rb:192:in `call'
    from ./booster_indexer/bin/booster_indexer:5:in `<main>'
taw commented 1 year ago

It sounds like you have older version of ruby. I added old version compatibility to the main code, but I guess I didn't add them to the indexer.

taw commented 1 year ago

Updated the code, it should run on Ruby 2.x now too.

axxroytovu commented 1 year ago

Apparently I upgraded to 3.0 but it was still using the existing 2.6 install by default 🤦 Should be all good now.

taw commented 1 year ago

Unrelated, but I just added MOM. It will of course need boosters too. (data isn't really correct as it's still spoiler season, usually it takes a while for data issues to get fixed)

axxroytovu commented 1 year ago

Ok I tried using filter: "" for the Eldraine collector booster and it's not as useful as I was hoping. If it worked line-by-line I could choose which subsheets don't include the filter, but as-is it screws with all of the common sheets and makes the actual yaml a lot harder.

taw commented 1 year ago

At some point I want to try filter: for draft and arena boosters, so all the exclusion rules can move to *.yaml files.

If it doesn't work for more complex boosters, we can try some additional things.

taw commented 1 year ago

I created a script bin/human_export_sealed_data for generating human readable versions of pack data, as diffing JSONs in magic-sealed-data was a bit of a hassle.

Nothing special there, just some diff-friendly output format.

Run bin/human_export_sealed_data. Then it will make files like tmp/sealed/vow-arena.txt etc. with contents like this:

name: Innistrad: Crimson Vow Arena Booster
code: vow-arena
packs:
- chance: 5
  sfc_common: 9
  dfc_common: 1
  sfc_uncommon: 2
  dfc_uncommon: 1
  sfc_rare_mythic: 1
- chance: 1
  sfc_common: 9
  dfc_common: 1
  sfc_uncommon: 3
  dfc_rare_mythic: 1
sheets:
  sfc_common:
  - [vow/1] Adamant Will - 1
  - [vow/11] Estwald Shieldbasher - 1
  - [vow/13] Fierce Retribution - 1
  - [vow/15] Gryff Rider - 1
  - [vow/18] Heron of Hope - 1
  - [vow/19] Heron-Blessed Geist - 1
  - [vow/24] Militia Rallier - 1

It looks pretty much like yaml because why not, but I didn't check if it's valid yaml.

The format is really only meant for diffing. I want to do some refactoring so I'll need it. Let me know if you'd like some changes here for your packs.

axxroytovu commented 1 year ago

That's very useful, I'll let you know if I run into any specific use cases that would be helpful.

The most helpful thing right now would be improving the alt searches that I need for the boosterfun treatments. If they could be made set-agnostic somehow, then I could make common common_showcase and uncommon_showcase sheets. That would save a lot of repetition.

taw commented 1 year ago

Sure, I added {set} expansion, and then some examples to common.yaml

In ELD They're used in 3 different ways for demonstration, but really they should just do what commons are doing.

The only downside is that there's no way to pass count: checks on subsheets, but count accepts math expressions, so it should be good enough to avoid mistakes.

If {set} creates some conflicts, I can change it to something else.

taw commented 1 year ago

I refactored all the boosterfun boosters:

I might have missed some further simplification, but I'm done with it for now.

axxroytovu commented 1 year ago

Refactoring is great. Included in https://github.com/taw/magic-search-engine/pull/181 as a test case. Unfortunately M21 is particularly complicated because of Teferi, so I wasn't able to use as much as I'd have liked, but it does make the collector booster significantly easier.

taw commented 1 year ago

@axxroytovu You mentioned that filter: "" didn't work for you. Would a filter that's possible to override an any depth be useful?

For traditional draft boosters global filter works fine, except for cards from other sets (like most commonly small sets borrowing basics from bigger set), so I don't know if that would be useful or not really.

axxroytovu commented 1 year ago

Yeah I don't know that it's worth it.

It's the difference between

- rawquery: "e:{set} {query}" Vs

- query: "{query}"
  filter: ""

It's probably easier to just use rawquery.

taw commented 1 year ago

I was thinking about something like this. Before:

  brr_retro_artifact:
    any:
    # Officially 1/6 schematc rate
    # U/R/M rates guessed to be at 4x/2x/1x multiples
    - rawquery: "e:brr number<=63 r:u"
      count: 18
      rate: 4*5
    - rawquery: "e:brr number<=63 r:r"
      count: 30
      rate: 2*5
    - rawquery: "e:brr number<=63 r:m"
      count: 15
      rate: 1*5
    - rawquery: "e:brr number>=64 r:u -number:/z/"
      count: 18
      rate: 4*1
    - rawquery: "e:brr number>=64 r:r -number:/z/"
      count: 30
      rate: 2*1
    - rawquery: "e:brr number>=64 r:m -number:/z/"
      count: 15
      rate: 1*1

After:

  brr_retro_artifact:
    filter: "e:brr -number:/z/"
    any:
    # Officially 1/6 schematc rate
    # U/R/M rates guessed to be at 4x/2x/1x multiples
    - query: "number<=63 r:u"
      count: 18
      rate: 4*5
    - query: "number<=63 r:r"
      count: 30
      rate: 2*5
    - query: "number<=63 r:m"
      count: 15
      rate: 1*5
    - query: "number>=64 r:u"
      count: 18
      rate: 4*1
    - query: "number>=64 r:r"
      count: 30
      rate: 2*1
    - query: "number>=64 r:m"
      count: 15
      rate: 1*1

Or for one of the sheets you wrote, before:

  ancillary:
    # C/U/R/M rates guessed to be at 8x/4x/2x/1x multiples
    any:
    - rawquery: "e:thb r:c -promo:boosterfun number:269-297"
      rate: 8
    - rawquery: "e:thb r:u -promo:boosterfun number:269-297"
      rate: 4
    - rawquery: "e:thb r:r -promo:boosterfun number:269-297"
      rate: 2
    - rawquery: "e:thb r:m -promo:boosterfun number:269-297"
      rate: 1

After:

  ancillary:
    filter: "e:thb -promo:boosterfun number:269-297"
    # C/U/R/M rates guessed to be at 8x/4x/2x/1x multiples
    any:
    - query: "r:c"
      rate: 8
    - query: "r:u"
      rate: 4
    - query: "r:r"
      rate: 2
    - query: "r:m"
      rate: 1

It would cut repetition a bit in such cases, but I'm not sure worth it. There's also a question of such filter appending e:{set} or not, as there are examples both ways.

axxroytovu commented 1 year ago

That's a much better use case, I like how that looks.

I think the extra 5 characters to put the set code in the filter each time isn't that much trouble. It's saving so much already that a few extra characters to make the code simple is probably worth it.

taw commented 1 year ago

I changed how filter: works. Now it doesn't do default e:{set} and it can be overridden for any sheet.

Relevant examples:

I didn't adjust any boosterfun yamls, but they have a lot of opportunities for reducing duplication.

Also notable potential use case:

  somesheet:
    filter: "..."
    use: anothersheet

There's likely a lot of places where it could be used like turning this kind of code (repeated so many times):

  commander:
    any:
    - rawquery: "e:c20 number:1-71 r:c"
      rate: 8
    - rawquery: "e:c20 number:1-71 r:u"
      rate: 4
    - rawquery: "e:c20 number:1-71 r:r"
      rate: 2
    - rawquery: "e:c20 number:1-71 r:m"
      rate: 1

Into not just this:

  commander:
    filter: "e:c20 number:1-71"
    any:
    - query: "r:c"
      rate: 8
    - query: "r:u"
      rate: 4
    - query: "r:r"
      rate: 2
    - query: "r:m"
      rate: 1

But just outright:

  commander:
     filter: "e:c20 number:1-71"
     use: ratio_1248_by_rarity
axxroytovu commented 1 year ago

That's awesome! It also made the japanese WAR booster super easy. https://github.com/taw/magic-search-engine/pull/184

taw commented 1 year ago

I think that scoped filter trick was my last major idea. Everything else looks fine to me. In my original plans I thought about filter/brr + query/brr, but scoped filter covers that use case at lower complexity, so there's no point.

Mostly I still want to write some short README explaining how booster yamls work, so more people can contribute. And that will close this issue.

There's still some code refactoring to do, but it's fairly low priority, and it might take a while until I get there. Error messages are also sometimes fairly bad, but it's not a big deal.

If you need anything else from me for boosterfun, let me know.

taw commented 1 year ago

I added is:basictype (= t:plains or t:island or t:swamp or t:mountain or t:forest), that comes up in boosters a bit (UNF, KHM), and might be useful for regular users as well.

axxroytovu commented 1 year ago

Issue came up while working on STX and MH2: how do we call out foil etched cards that do not have their own card number? Cards like Approach of the Second Sun use finishes: ["foil", "nonfoil", "etched"] to indicate that they have foil etched versions, as opposed to CMR or 2X2 that have a different frame effect for their etched cards. Any thoughts on how to make that work?

taw commented 1 year ago

I'm ok accepting this issue as a limitation, and just mark them as regular foils. Introducing another foiling state or duplicating these cards on our side is a lot of added complexity for fairly minor benefit.

You can add a comment to the yaml file and call the sheet etched_foil for some explanation I guess, but it won't be machine readable.

We could collect all known minor issues (tokens, checklist cards, art cards, etched cards, common subsheets, unset variants balance, Kawigawa brothers etc. etc.) and see what we can fix without too much trouble.

axxroytovu commented 1 year ago

Sounds good. I've already labeled the sheets as "etched" so that'll at least indicate what is supposed to be in the slot.

taw commented 1 year ago

I improved error messages when loading booster index. Now it should always say which sheet in which set failed to build.

axxroytovu commented 1 year ago

Let me know if you want some help writing the documentation. The error handling is much appreciated.

taw commented 1 year ago

Yeah, you probably have better perspective on what should be documented. Just create BOOSTERS.md or something on top level, and describe the whole process.