joshbduncan / word-search-generator

Make awesome Word Search puzzles!
MIT License
75 stars 24 forks source link

Move allowed_directions to the Word object & give them priorities #76

Open duck57 opened 2 months ago

duck57 commented 2 months ago

Should finish off #51.

Few things I thought of to check before merging.

  1. Should I edit Word.__lt__ at all?
  2. WordSearch.__init__ is rather messy, as the secret words are formed before the call to super().__init__.
  3. Updating WordSearchGenerator to use priorities & direction choices stored in the Word was surprisingly straightforward once I found where to make the change and did the prep work.
joshbduncan commented 2 months ago
  1. Yes, I like the use of __lt__...
  2. I agree, we can probably add secret words to the parent Game class since I could see that being used in more types of games. I still need to do some work on the base Game class, I've just been too busy with other stuff to get back to it.
  3. Makes sense, I just don't know where I fall on the issue... Not a make-or-break decision so I think it is fine.

Can you just please make sure all docstrings/comments are updated anywhere you added params to any functions/methods? I know that really helps me when using other APIs to understand what the dev was thinking. Other than that, I think all looks good to merger. Seems there are a few conflicts but I can resolve those after you are done. Thanks!

duck57 commented 2 months ago

For adding secret words back to the base Game, where in the param list should they go? For that matter, should secret_level also go to the base Game class? On a broader scope, should the construction of Word objects from input strings be done by the subclass before Game.__init__, done during Game.__init__, or perhaps done after the Game.__init__ call (though that option requires a reflow of when .generate() gets called)?

As of right now, I'm leaning toward making Game._process_input be closer to a pure bulk_create_words

@staticmethod?
def _process_input(self?, words: str | WordSet, secret: bool = False, allowed_directions: DirectionSet | None = ???, priority: int = 3, max_words: int = 100):
    if isinstance(words, set):
        return words  # they're already created
    ...  # transform str -> set[Word]

Or move it to a method in the Word class? Word.bulk_create(...) or something.


Until I hear back on that, I'll go ahead and change the min_length in the random word to equal 1 as well as updating docstrings (hoping the diff vs. main catches them all).

Speaking of utils.get_random_words, how should we handle cases where the number of matching words is less than the number requested? As written, it'll raise a ValueError. Returning all matching words is the less surprising option.

duck57 commented 2 months ago

One other thing: should I subclass Word into CrosswordWord and move description & number into the subclass?

Description is meant to be the clue for a crossword, while number is used after the puzzle is generated.

duck57 commented 2 months ago

I think I have a solution that allows all of the following.

  1. Game only needs one param for words [still needs a separate secret_level]
  2. Isn't overkill that's intended for there being 50 word types instead of just two (hidden/secret). [passing some list[tuple[str, Callable[[str], WordSet]]]
  3. Doesn't ping-pong with subclass constructors (word creation and level setting are both completely done in Game.__init__

It's to make something like the below class:

class WordInput(NamedTuple):
    preformed_words: WordSet
    hidden_word_list: str
    secret_word_list: str

def Game.__init__(words: WordInput, ...):
    ...  # setup directions etc
    self._secret_words: WordSet = self._process_words(words.secret_word_list, ...)

The other serious option would be to restore the old Game.init signature to def Game.__init__(preformed_words: WordSet = frozenset(), words: str = "", secret_words: str = "", level = None, secret_level = None, ...)


Also, I may write something in utils.py to take some Sized iterable and then return n random objects from it if it's longer than n (otherwise return the input unaltered). I see the pattern used in both utils.get_random_words and Game._process_words.

duck57 commented 2 months ago

I tried making the following function but got defeated by the type checker.

S = TypeVar("S", bound=Sized)
T = TypeVar

def limit(i: S, n: int) -> S:
    """
    Limits the size of the input iterable to n.  Truncation is randomly chosen.
    """
    if len(i) <= n:
        return i
    return type(i)(random.sample(i, n))

It seems to work in use, but mypy dislikes it.

joshbduncan commented 1 month ago

I will try to look over this tomorrow. Thanks for your work!

joshbduncan commented 1 month ago

I'm going to comment in chunks as I have time and to keep different convos easier to follow...

joshbduncan commented 1 month ago

It seems weird to me that bulk_create() is in the Word class with "word" being singular. I wouldn't think to use the Word class to create a list of word(s).

joshbduncan commented 1 month ago

With get_random_words() I can't think of a case where I expect that to return the entire WORD_LIST. If I wanted the entire list I would just import it...

def get_random_words(...):
    ...
    return limit(
        (
            WORD_LIST
            if max_length == 999 and min_length == 1
            else [word for word in WORD_LIST if min_length <= len(word) <= max_length]
        ),
        n,
    )

And, I don't understand the usefulness of the limit here (other than type hints) but maybe I'm missing something? If you plan to take it this far, would creating a WORDLIST class that inherits from Sized be a better solution?

joshbduncan commented 1 month ago

My thoughts on the base Game class... Since you are moving things like directions to the underlying Word objects, what if the Game class only accepts an iterable of Word objects? Those objects can be created during the init of the subclasses and provided to the super.__init__() after the fact... This keeps the base class "clean" and allows tinkerers to do whatever funky things they need to do in their own subclasses.

def __init__(
    self,
    words: WordSet | None = None,  # or WordList, or whatever we want to call it
    level: int | str | None = None,
    size: int | None = None,
    require_all_words: bool = False,
    generator: Generator | None = None,
    formatter: Formatter | None = None,
    validators: Iterable[Validator] | None = None,
):
    ...
duck57 commented 1 month ago

It seems weird to me that bulk_create() is in the Word class with "word" being singular. I wouldn't think to use the Word class to create a list of word(s).

Would a make_many_words(...) function in core.word be a better place for it?

duck57 commented 1 month ago

My thoughts on the base Game class... Since you are moving things like directions to the underlying Word objects, what if the Game class only accepts an iterable of Word objects? Those objects can be created during the init of the subclasses and provided to the super.__init__() after the fact... This keeps the base class "clean" and allows tinkerers to do whatever funky things they need to do in their own subclasses.

def __init__(
    self,
    words: WordSet | None = None,  # or WordList, or whatever we want to call it
    level: int | str | None = None,
    size: int | None = None,
    require_all_words: bool = False,
    generator: Generator | None = None,
    formatter: Formatter | None = None,
    validators: Iterable[Validator] | None = None,
):
    ...
  1. Would we still need the level and secret_level params in the Game constructor? I think the answer is still yes because of Game.add_words().
  2. Speaking of moving things around, part of what I try to do when I see it is to move type aliases to the same class that they're based on. For example, moving the definition of DirectionSet from word.py to directions.py.
  3. I like this idea. Helps separate what's core shared logic for a letter grid game from what's the business logic for each game.
  4. Also, whenever I get the Crossword to a MVP, I'll open a PR for that. It won't be anywhere near ready to merge (limited docstrings, no tests, etc…), but it would show other places I thought to move around and—more importantly—be a concrete implementation of a second Game to see how the changes feel in practice.
duck57 commented 1 month ago

With get_random_words() I can't think of a case where I expect that to return the entire WORD_LIST. If I wanted the entire list I would just import it...

def get_random_words(...):
    ...
    return limit(
        (
            WORD_LIST
            if max_length == 999 and min_length == 1
            else [word for word in WORD_LIST if min_length <= len(word) <= max_length]
        ),
        n,
    )

And, I don't understand the usefulness of the limit here (other than type hints) but maybe I'm missing something? If you plan to take it this far, would creating a WORDLIST class that inherits from Sized be a better solution?

>>> from words import WORD_LIST
>>> len(WORD_LIST)
970
>>> from collections import Counter
>>> Counter(len(w) for w in WORD_LIST)
Counter({4: 239, 5: 187, 6: 157, 7: 126, 3: 100, 8: 78, 9: 41, 10: 24, 11: 10, 12: 4, 14: 2, 13: 2})

If you requested get_random_words(100, min_length=10), the current random.choice implementation would give ValueError: Sample larger than population or is negative. The limit function will instead return fewer words than requested (42 in the case of a minimum length of 10) without throwing errors.

joshbduncan commented 1 month ago

I forgot you added the min_length attribute, so I understand the need for limit now.

joshbduncan commented 1 month ago
  1. Would we still need the level and secret_level params in the Game constructor? I think the answer is still yes because of Game.add_words().
  2. Speaking of moving things around, part of what I try to do when I see it is to move type aliases to the same class that they're based on. For example, moving the definition of DirectionSet from word.py to directions.py.
  3. I like this idea. Helps separate what's core shared logic for a letter grid game from what's the business logic for each game.
  4. Also, whenever I get the Crossword to a MVP, I'll open a PR for that. It won't be anywhere near ready to merge (limited docstrings, no tests, etc…), but it would show other places I thought to move around and—more importantly—be a concrete implementation of a second Game to see how the changes feel in practice.
  1. Yes, I think so since those are comman to all game types (so far).
  2. Yeah, I like to do this as well but at some point, I remember running into some circular import issues so I had to move something around. I would prefer the type aliases to be beside or as close as possible to the corresponding class so feel free to move whatever makes sense.
  3. ...
  4. Sounds good 👍
joshbduncan commented 1 month ago

Would a make_many_words(...) function in core.word be a better place for it?

Yes, I think it makes more sense as a standalone helper function and not a method of the Word class.

duck57 commented 1 month ago

A shower thought I had for a crazy use of the Game generation flow would be to use masks to generate cardioid Minesweeper grids. Perhaps I may have time later in the year or early next year to use that as a Textual learning project.


Anyway, I think my current to-do for this PR is

  1. Move WordList, DirectionSet, etc… definitions next to the classes they define. While doing that, I'll make sure that each of the classes (Word, Direction, and Position seem to be the ones I'd use so far) has types defined has both a List and Set variety. I can't think of any other collection suffixes we'd need.
  2. Move the bulk creation out of the Word class
  3. Simplify Game.init
joshbduncan commented 1 month ago

Yep, all sounds good. Feel free to let me know if you need a hand with anything and I can hop on your PR. I hope to get tests created for the color output I added a while back in my free time. Thanks!

duck57 commented 1 month ago

Just to double-check, do we need both level and secret_level in the base Game class, or just level?

I think that's the last clarification I'd want before checking that the simplified Game init is ready.

joshbduncan commented 1 month ago

I would say just level. I'm not sure secret level would apply to all game types so that can be implemented in subclasses like WordSearch or CrossWord.

duck57 commented 1 month ago

One thing I thought of is that not all games would have the prohibition on duplicate words of a word search. IIRC, that's why words are stored in a set instead of a list. I briefly considered adding a class variable to Game that specified whether the given Game class used sets or lists, but quickly scrapped the idea. While I suspect not much would need changed if the Words were either in sets or lists, the type signatures alone would be a mess.

In case duplicate words need to be sorted in a particular order, the following can be (ab)used to achieve the same effect:

  1. Use the description field for the Word object as a random salt (or insert non-printable characters if the printed description needs to look the same).
  2. Go nuts on the Priority values

Additionally, the number_words that I'll implement for the Crossword will take care of 1-indexed numbering words starting in the top-left.

joshbduncan commented 1 month ago

One thing I thought of is that not all games would have the prohibition on duplicate words of a word search. IIRC, that's why words are stored in a set instead of a list. I briefly considered adding a class variable to Game that specified whether the given Game class used sets or lists, but quickly scrapped the idea. While I suspect not much would need changed if the Words were either in sets or lists, the type signatures alone would be a mess.

In case duplicate words need to be sorted in a particular order, the following can be (ab)used to achieve the same effect:

  1. Use the description field for the Word object as a random salt (or insert non-printable characters if the printed description needs to look the same).
  2. Go nuts on the Priority values

Additionally, the number_words that I'll implement for the Crossword will take care of 1-indexed numbering words starting in the top-left.

Yes, a set was picked to eliminate dupes but also for its ease of use when modifying the WordSet of a game. If we use a list, we'll have to account for some things a set covers by default (adding, removing, comparing, etc.). And we'll need a way to handle dupes if they are allowed in a game (which will also have to be specified).

At this point, since there are so many possibilities for diff types of games (too many to account tbh), what if Game is just an abstract base class to guide users in creating their own games?

duck57 commented 1 month ago

Some questions about making Game abstract:

  1. Would that entail moving Generator classes back into being an abstract method on the base Game class?
  2. Do we retain the core data structure of list[list[str]]? The only other option that springs to mind besides “keep track of that in your child class” is creating a Cell class that tracks the following:
    1. The stored char
    2. Is active? (for masking)
    3. Is head/tail of word? [DirectionSet, empty == no]
    4. Available directions for word placement
    5. Word(s) the cell is part of
  3. While I expect the formatters to remain as separate classes, a Game.debug_format() could prove useful for quickly outputting the Puzzle to the terminal.
  4. Here's my top-of-head estimate at what remains in Game:
    1. Init: store params, take pre-digested WordSet, then generate
    2. The usual utility methods/properties (placed_words, unplaced_words, etc…)
    3. Placing the word
    4. I'm sure I'm forgetting a bunch.
  5. Besides generate(), any other obvious methods to make abstract?
joshbduncan commented 1 month ago

I had a little time to think about this earlier today and here is what I've come up with...

We are doing great work abstracting away the core of a "word-based" game but that is taking us away from the initial project of word-search-generator. I'm thinking about pulling out the core portion of this package and creating a new library called word-game-generator. This new library would be the base of individual projects like word-search-generator and maybe crossword-generator. This would allow:

The more I have worked with the Textual library and seen the diversity of projects people create with it, I really like this model... In the separated "child" projects, if you need to extend the Word class to add a color property or a description property, or to add a reverse method 🤷‍♂️, feel free to extend to your heart's content.

I just think this project is getting a bit bloated and turning into more than a word-search-generator which could be very confusing to others.

Let me know your thoughts?

joshbduncan commented 1 month ago

So, I had thought about the container for a game's Word objects.

How about something like this for WordList... I also thought, what if WordList didn't care about dupes or max words (could remove all dupe stuff from class below), instead all of that gets handled in the base game object (during generate)...

Notice, that this works with iterables of Word objects, so all processing of string input would need to happen in ether the base game object (or a subclass e.g. WordSearch).

I feel this is more Pythonic and allows people to use basic Python methods to interact with the Word container... Thoughts?

from collections.abc import Iterable

from .word import Word

class WordList(list[Word]):
    """This class represents a WordList within a WordGame."""

    def __init__(self, words: Iterable[Word], allow_dupes: bool = False) -> None:
        self._allow_dupes = allow_dupes

        _words: list[Word] = []
        for word in words:
            self.validate_word(word)
            if not self._allow_dupes and word in _words:
                continue
            _words.append(word)
        super().__init__(_words)

    def validate_word(self, word, check_for_dupes: bool = False) -> None:
        if not isinstance(word, Word):
            raise ValueError(f"'{word}' must be a 'Word' object.")
        if check_for_dupes and not self._allow_dupes and word in self:
            raise ValueError(f"'{word}' already exists in the WordList.")

    def __setitem__(self, index, word):
        self.validate_word(word, check_for_dupes=True)
        super().__setitem__(index, word)

    def insert(self, index, word):
        self.validate_word(word, check_for_dupes=True)
        super().insert(index, word)

    def append(self, word):
        self.validate_word(word, check_for_dupes=True)
        super().append(word)

    def extend(self, other):
        _words: list[Word] = []
        for word in other:
            # should probably just continue or notify, only raising exception during dev
            self.validate_word(word, check_for_dupes=True)
            _words.append(word)
        super().extend(_words)
duck57 commented 1 month ago

I don't have much time this week(end), but from a quick scan:

  1. The WordList container means that Word.__eq__ can simply compare the Word.text instead of any hacks
  2. In if not isinstance(word, Word): raise ValueError(f"'{word}' must be a 'Word' object."), why ValueError over TypeError? That would give the caller a clearer view of what went wrong.
joshbduncan commented 1 month ago
  1. Yes
  2. It should definitely be a TypeError. I was typing this during a Zoom call so I wasn't paying the closest attention...

And let me know what you think about pulling out the base game (WordGame) into a separate library like I mentioned above.

Have a great weekend!

duck57 commented 1 month ago

Our (mostly your) prior work in moving everything to classes should be a useful guide in deciding what is core functionality and what is business logic for the game.

The one suggestion I'd have about making the core package separate is we'd want to alter type hints that deal with Words to also work with whatever subclasses of Words we need. For example, we should say that if it takes a CrosswordWord in, it should give CrosswordWords out (instead of base Words). That way, additional functionality can be quickly added without needing to make a PR for the core functions.

Besides everything in the core folder and utils.py, is there anything else that springs to mind as an obvious inclusion in the Core package?

joshbduncan commented 1 month ago

For subclasses of Words (or any custom classes), we should be able to just use the type[Word] syntax I think... And, yes that sounds like all that needs to be moved but honestly, I jump from project to project so much, I won't be sure until I dig back in to the code.

joshbduncan commented 1 month ago

I had a little time to consider splitting off the core game, and I feel it's just not worth the trouble. I still think it's a good idea to make some of the adjustments I mentioned above so that the core package can act more like a library. Then if we need special properties or methods for a different type of game we can subclass any of the core objects. This will keep the core as simple as possible and allow the complexity to live closer to the actual games. Next week, I'll closely look over the PRs and we can work through them.