Unbake assumptions about secret words [priority, valid direction] from the Game class

duck57 commented 1 year ago

There's probably a better title for this proposal, as it's broader than that. I'd also be happy to work on coding this, but I want to discuss it first—especially considering the big refactor since the last time I poked at this project.

The overall goal is to increase the Game object's flexibility by decoupling some word placement assumptions. Ideally, the WordSearch API will remain unchanged by these under-the-hood changes.

New attrs for the Word object

`priority: int = 3`

Instead of baking the algorithm of first placing the regular words before secret words into the Game object, assign each word a priority.

#pseudocode for Game object methods
def words_by_priority_dict(self) -> dict[int, list[Word]]:
    # there's almost certainly an easier way to do this with itertools
    # or at least a clever one-liner
    d = defaultdict(list)
    for w in self.words: 
        d[w.priority].append(w)
    return d

def list_words_by_priority(self) -> list[Word]:
    # see above comment, probably doesn't need 2 functions
    o = []
    for pri in self.words_by_priority_dict().values():
        o += sorted(pri, key=lambda w: len(w), reverse=True)
    return o

Priority 1 words go first, then priority 2, etc…

Within each priority, the default sort is longest to shortest.

For the existing behavior of the WordSearch object, regular words should receive priority=2 and secret words priority=4.

`valid_dirs: DirectionSet | None = None`

(If set to None, should the word be skipped during placement or treated as a wildcard? Personally, I'd lean toward skipped.)

An alternate implementation would be to pass a dict[int, DirectionSet] to the Game object where the valid directions for each priority are shared. However, I feel that it's best for subclasses of Game to assign priorities to the Words during ingestion rather than forcing all words of the same priority to share the same valid directions in the Game class.

Moving the valid_dirs attr to the Word also frees up the Generator class to drop the directions & secret_directions params from its generate function, as that data is now contained within each Word.

For the WordSearch, assign either directions or secret_directions during Word creation as appropriate.

Conflict with existing design in the above

The Game class takes its words and secret_words as strings instead of iterables of Word objects. Perhaps I'll think of a better location for the ingestion & creation of Word objects after I've gotten some sleep and thought more about Python's init order. The init of a Game updated to reflect this proposal would look something like __init__(words, size, *, ...). However, the init for its subclasses should remain largely as they currently are.

Duplicate word policy

Perhaps this is addressed in existing code, but Game could accept an optional parameter of a Callable for duplicate_policy. Perhaps it could be a Validator… would need to look at it with freshly-caffeinated neurons. These would be made on Word objects after they have been assigned valid directions and priorities.

Three policies spring to mind:

Error. Raises an error when it encounters a duplicate word.
Merge: (see below pseudocode)
drop(first=True) keeps either the oldest or most recent duplicate.

# this modifies w1 so the init doesn't need more fields
def merge(w1: Word, w2: Word) -> None:
    w1.secret = w1.secret and w2.secret
    w1.valid_directions |= w2.valid_directions
    w1.priority = min(w1.priority, w2.priority)

Word len

This doesn't require any of the above to implement

def __len__(self) -> int:
    return len(self.text)

joshbduncan commented 1 year ago

Yeah, these all seem like good steps to take. My last efforts were to abstract out each part of the API so the package could be used to create many types of word-based games. Although it is called "word-search-generator", I have had quite a few people ask for it to do non "word-search" type things.

While coding your suggestions, we'll need to try and stay true to the original implementation and also keep in mind this package is used a lot via the CLI. With that said here are below are some quick thoughts off of the top of my head.

joshbduncan commented 1 year ago

Forcing users to provide word objects requires quite a bit of upfront work to get a simple puzzle going. They have to understand the API enough to generate an iterable of word objects, each with a specific type, priority, and direction set. I understand we can set up defaults for all of these, but I'm trying to think of a Python newbie just wanting to make a simple puzzle.

Game.__init__() could accept words: str | Iterable[word] | None but that is kind of odd and ugly
Game.__init__() could accept words: Iterable[str] | Iterable[word] | None which looks better but is a breaking change
Game.__init__() could stay the same but also offset a setter for words that allows users to set words after init is they want to

Or... If we are going to break things, then we could go radical and extract all of the mess from the Game.__init__() and provide other ways of injecting the words, levels, directions, etc. into the game.

class Game:
    """Base object for a word base puzzle game."""

    DEFAULT_GENERATOR: Generator | None = None
    DEFAULT_FORMATTER: Formatter | None = None
    DEFAULT_VALIDATORS: Iterable[Validator] = []

def __init__(
    self,
    generator: Generator | None = None,
    formatter: Formatter | None = None,
    validators: Iterable[Validator] | None = None,
):
    ...

duck57 commented 1 year ago

I've written programs with ugly inits before that will accept pretty much any garbage you throw at them, then used a silly amount of business logic to decide whether it was the plaintext input or pre-formed internals.
Another option for Game.__init__() is to add a pre_digested: Iterable[Word] | None = None (TODO: better name) param while leaving the rest of the existing signature alone. In the init logic, check that at least one of words, secret_words, or pre_digested is not empty.
Since you mentioned breaking changes, I noticed that 3.5 is the most recent release on PyPi. If you wanted to go for the breaking change route, would this still be a 4.0 change, or would it be enough to be 5.0?
There's probably more considerations, but that's all I can think of for now.

joshbduncan commented 1 year ago

I haven't released a new version to PyPi yet as I wanted to get the testing in place for the play add-on using the Textual library. Testing in Textual is a little more involved than normal unit tests so I just haven't taken the time to do it. All that to say, this would be a good place for the 4.0 release. Technically, the play extension and all of the abstractions are "unreleased".

duck57 commented 1 year ago

The more I think about it, the more convinced I am that adding a new argument to inject the pre-formed Words in Game.__init__() is the right solution. Keeping the current signature intact gives sensible default behavior for the simple cases of a standard WordSearch and the slightly more advanced case of a word search with entirely secret words.

If you really wanted to go nuts with OOP, DI, and ABCs, perhaps there could be an Ingestor ABC that converts strings into Words, but I can't think of any uses that the Validators don't already cover.

So far, I'm thinking of word ingestion workflow in Game.__init__() will work something like

.strip() words and secret_words (IIRC, Word.__init__() handles the .upper())
Check that at words, secret_words, and preformed_words do not all evaluate to False [a.k.a. that the Game was passed some words)
Turn the words string into a bunch of Word(secret=False, priority=2, valid_directions=valid_directions)
secret_words becomes a list of Word(secret=True, priority=4, allowed_directions=secret_directions)
The three lists of Words are combined into the Game's WordList. Duplicate handling happens here.

[also, I'll modify my proposal to add a valid_directions attr to Words by renaming it to allowed_directions]

duck57 commented 1 year ago

Some small improvements I've made over on my word-merge branch if you'd like to cherry-pick them.

a8a7e52e922a8a2af793d91ef931b4992f924125 Separate pyproject.toml dependencies to one per line
8862d7a209a83e9d7de51be249c024dbc1ac0545 Edit the Makefile to run ruff in fix mode before flake8 during linting.
a0b2b85ca2a2e93f0f549172915ea34a542a1e0e Fix test_Game Once I've napped, I'll push two more commits to that branch. The first contains some draft groundwork for improving the Word class. The second will update tests to match the improvements.

duck57 commented 1 year ago

More insomnia thoughts:

Perhaps this is one of the places where your OOP & ABC crusade can be applied, but a Crossword would not need the dupe & substring removals that you'd expect in a WordSearch. However, adding an optional remove_duplicate_words: bool = True to Game.__init__() may be easier than making an Ingestor ABC.
One advantage of moving from WordSet to WordList is point 1
With the WordList sorted by priority, it looks like there may be a couple of places we'd need to chase down and replace manually splitting the WordSet into hidden and secret words with a simple through all the words.

joshbduncan commented 1 year ago

So, #1 is actually why I did all of the abstraction. The idea is to have a base Game class that takes in some general information about a word puzzle type game like, words, size, directions, etc. Then if someone wants a game type different from WordSearch, they can implement their own puzzle generator based off of the Generator ABC. If you look inside of the Generator file you'll see the one for WordSearch that includes functions specific to word search puzzles like no_duped_words.

The only requirement of the generator is to return a generated puzzle in the form of a list (Puzzle) as the Game base class is set up to call the generate method.

This same idea applies to the Formatter ABC.

duck57 commented 1 year ago

Some thoughts on making a Crossword before I dust off my old implementation and kick tires on the OOP. Lots of thinking out loud for my October self to review.

Thanks for pointing out that I was looking in the wrong spot for the validation.
I'm now considering adding a hint: str = "" field to the Word class. It can be there for other games to use (or ignored). It seems better to put it there than in CrossWord(Word) class. See below.

Implementing the crossword will be quite useful for finding any missed spots in the OOP refactor.

def CrossWord.__init__(word: str, hint: str):
 hint = hint.strip()
 assert hint
 super().__init__(word, False, 1, {Direction.E, Direction.S}, hint)

EDIT: Forget the above code (see point 4)

Actually, with some param to hold extra text in the base Word class, there is no need to create a CrossWord class (not to be confused with Crossword). The Direction assignment and uniform priorities can be handled during the Crossword's raw text ingestion.
"Description" is probably a better name than "hint" or "clue." That seems like it would be more generic or useful for othre Games to use.
Another proposal: add a placed: bool | Exception (or bool | str?) field that is set to False during Word.__init__(). Games can then set the field to True once the word is placed or log an Exception (or perhaps just str?) stating why the word cannot be placed.
I forgot: Word already has the placed property. Maybe only add a validation_errors: Exception | str = "" field or forget point 6 entirely.
```
# in Game()
@property
def errored_words(self) -> WordList:
return [word for word in self._words if word.validation_error]
```

@property def words_to_place(self): ... # non-errored unplaced words, needs better name def all_unplaced_words(self): ... # include errored and waiting words


8. Now I've looked into it some more, I see how our `no dupes or substrings` implementations differ.  Mine was written as part of the text ingestion process, while yours doesn't check until immediately before the word is up for placement.  This is relevant when a long word cannot be placed, but its subword can.  My initial implementation as a cleaner would have removed the subword before _any_ words were placed.  A bit of conflict between the philosophies of validation and cleanup.
9. Note to self: validators for a `WordSearch` are applied in `generator.WordSearchGenerator.fill_words()`.  Keep this in mind for validating the `Crossword` (Crossword validation checks `len(Word) > 0`, `Word.description is not empty`, maybe others) when writing the CrosswordGenerator.

joshbduncan commented 1 year ago

2/4/5. Yep, I like adding a description/clue/hint or whatever we call it to the Word class.

So during process_input() dupes are removed. The no_duped_words() you see in the generator plays a different role...

In the example below, if you tried to place the word "fat" at row=3, col=2 (0-indexed) going east, you would create a duplicate word of "bat". There is no way to predict this during ingestion as the placements are random. This same function is also used when filling the word search board with random characters.

B * * * R
* A * * A
* * T * T
* * * * *
* * C A B

As for subwords, there is already a word validator built and it is a DEFAULT_VALIDATOR on the WordSearch generator. The reason I don't validate until the generation is there are cases where a word may not be valid at first but could be valid after. In the example below, only one of those two words can be valid when in the same game but if you removed either, then the status of the other could change.

word-search wanted unwanted -s 8
---------------
  WORD SEARCH
---------------
M I E C A H D S
Y H C B L F O P
M Z M V H W C C
A H R A U W Q O
Y H G Z Z P G I
U N W A N T E D
O X O X Q O U Y
D V M D H P Z V

Find these words: UNWANTED
* Words can go E, SE, S, and, NE.

Answer Key: UNWANTED E @ (1, 6)

duck57 commented 1 year ago

I think my word-merge branch is mostly complete—I don't remember why I named it like that. It would probably be a good idea for me to diff it after #52 is merged to remember the main thrust of the word-merge PR.

After that,

Implement the Crossword
Alter GenerateWordSearch to loop through words sorted by priority?

duck57 commented 2 months ago

Howdy,

You've been quite busy in the past year. I had a demo ready to show you, but it looks like there's been a substantial refactor, so it's not ready for a PR (massive merge conflicts when I tried a git pull --rebase).

From my brief look at the new changes, it looks like there's no need for me to touch the Game class at all to achieve the same result as above on the current codebase. I think a summary of the above changes but applied to the new refactor are:

Type annotation changes and improvements to process_words in the WordSearch class
A four-line edit to WordSearchGenerator to use priorities instead of hard-coding hidden_words + secret words (the order will be maintained by assigning the appropriate priorities in process_words)
Changes in Word.py to add a priority field as well as __lt__.

duck57 commented 2 months ago

…I think that should be it. Is there anything else you'd want me to review before making a PR?

This should be a stable foundation to make the Crossword, but that'll be an entirely separate PR from this issue.

joshbduncan commented 2 months ago

Hey, I left a comment about min_words, but everything else looks good to me. Some things may be cleaner as a subclass but that is something we can work through later. Thanks!

joshbduncan / word-search-generator