Open Sazoji opened 2 years ago
I had a look onto some pages in the Ravensburger category. Sadly, the wiki is not in the best condition right now. A way to get all the data on pages of a category is to export them as one (big) XML (via special page Special:Export). The file is about 3.34 MB in size.
They use a template named Puzzle
to present the data in a standardized way, but not all arguments are provided by the wiki community. For the data most interesting for the given purpose, I found 2393 matches by searching with this regular expression:
PIECES=\d+.+PIECESR=\d+.+SIZEP=\d+
I'm dropping this as a discussion thread if anyone else wants to comment in their own opinions on JIG scoring instead of taking the time to make their own PR for a simple python script (unless you really want to).
In my personal opinion All jigsaws couldn't be predicted with JIG with perfect accuracy, but a single company should be nearly solvable by following or tuning the badness score to match whatever internal policy makes their puzzles (as long as it's consistent). I think the primary thing to do is address how the scoring is done by each additional piece, instead of a percentage of the total pieces. A 100 piece with 32 more is unacceptable, but a 1200 with 32 more is ok. additional/total * constant = badness.
I'ts also possible to pool data from a wiki (jigsaw-wiki.com/Category:Ravensburger) and perform some data science, comparing the typical image ratios (panoramic, square, portrait) and some metadata like the year of release to see if they've changed the range of acceptable piece ratios, even see if JIG's "badness" score changes between brands, time, or number of pieces in order to adjust the algorithm or see how puzzle standards change over time.
This seems like one of the perfect things for someone to waste their time on, I think puzzles use some sort of standard die and just choosing between whatever 5 standards they made in the 1800s would be the "ideal" JIG algorithm.