Closed nicolesimon13 closed 8 months ago
So, you are probably working on the latest Pypi release of the package v3.5.0. That release has a slightly different placement algorithm than the latest code here on GitHub. Before I explain what is happening, I would suggest installing the latest version directly from Git using the code below...
$ pip install git+https://github.com/joshbduncan/word-search-generator
Word Validation: When sending words to a WordSearch puzzle, the words are first validated. So, single-letter words, words with punctuation, palindromes, and "sub-words" are all discarded. Single-letter words and words with punctuation are pretty self-explanatory... Palindromes just cause confusion (especially with the key starting position and directions)... And, "subwords", or words that are parts of other words, obviously aren't normally valid as the word could be found twice in the same puzzle.
Your word list has numerous words that fail the validation so there will always be a few that will never show up in the puzzle as they are discarded during initialization. Use the WordSearch.words
property to see which words were validated, and the WordSearch.placed_words
property to see which words were actually placed (WordSearch.unplaced_words
shows which weren't).
You can also use len(WordSearch.placed_words) == len(WordSearch.words)
to see if all words were placed, or set the require_all_words
property to True which will throw an exception anytime the puzzle is generated and all words are not placed.
In the latest version of the code, the word validation has been extracted, allowing you to specify which validators you want to use (or none).
from word_search_generator import WordSearch
words = "TELLURIUM,CHROMIUM,NICKEL..."
p = WordSearch(words, size=60, validators=None)
## or to require all words to be placed
p = WordSearch(words, size=60, require_all_words=True, validators=None)
There are still times when every word will not be placed. The generator is very good at placing words and can fill up the available space with all words but sometimes (for various reasons), it is not possible. If you want to keep generating different versions of the puzzle layout until all words can be placed you can use the code below.
words = "TELLURIUM,CHROMIUM,NICKEL..."
p = WordSearch(words, size=60, validators=None)
while p.unplaced_words:
p.generate()
Finally, you asked about the word and size limits... It comes down to the puzzle generation time and PDF output. Puzzles larger than 50 in size don't fit well on a letter-size sheet of paper that is used in the PDF output. The text becomes so small is it illegible. Same for max words, the wordlist and key can't fit on the page.
In the latest version of the code, the defaults are set on the WordSearch object so you can subclass the base object and set your own defaults.
class ReallyBigWordSearch(WordSearch):
MAX_PUZZLE_SIZE = 1_000_000
MAX_PUZZLE_WORDS = 2_000
p = ReallyBigWordSearch(words, size=60, validators=None)
thank you again for your answers. I really appreciate them.
Word Validation I had wondered about that but did not want to ask. Maybe take this paragraph and add it to the documentation / faq?
And, "subwords", or words that are parts of other words, obviously aren't normally valid as the word could be found twice in the same puzzle.
Usually likely good, but in this case not what is wanted. The person solving this puzzle would know about the duplication.
There are still times when every word will not be placed. The generator is very good at placing words and can fill up the available space with all words but sometimes (for various reasons), it is not possible. If you want to keep generating different versions of the puzzle layout until all words can be placed you can use the code below.
Maybe somebody else reads this - I am currently consider using a format where I regenerate with size+1 if not all are placed after a few tries. That will not do for "I need 15x15" but in cases like this I can then start with 50 and go from there.
"In the latest version of the code, the defaults are set on the WordSearch object so you can subclass the base object and set your own defaults." This would take care of the problem of me having to edit the source code for that, right?
And I understand the limitation for the PDF but in that case I think a big big warning output would be better than just limiting the user. Instead of being able to say "screw it I know what I am doing" I now need to add manually a subclass. And also - this is perfectly well written, just make an odds and ends docu on the wiki, there are several things you answered me which can go in there as faqs! :)
And last a simple yes no question I hope: If I am using mask (to create a rectangle instead of square), do I need to apply the mask after every generation? I would think that it is enough?
words = "TELLURIUM,CHROMIUM,NICKEL..."
p = WordSearch(words, size=60, validators=None)
p.apply_mask(Rectangle(mask_x, mask_y))
<other code>
while p.unplaced_words:
p.generate()
Again thanks for the quick reply!
And if somebody else runs into this while installing:
error message:
word-search-generator 3.5.0 requires fpdf2==2.4.2, but you have fpdf2 2.7.5 which is incompatible.
this worked:
pip uninstall fpdf2
pip install fpdf2==2.4.2
pip install --no-deps git+https://github.com/joshbduncan/word-search-generator
this is more a fyi observation as something I have noticed throughout and since it is also about placement. ;)
this grid is done only s+e, using the latest version. The result is only going south. I needed to run it seven more times until i got a version which is not just one direction. I have seen similar things also on level 3 grids - tons of just vertical or just horizontal, often not even backwards, even if enough space is available.
it looks like there is not enough (for the lack of a better word) 'random diversity'. yes it is a valid placement but it is a very boring puzzle. Even my biggest puzzles run at maybe 30 seconds generation time for my full script - I can spare the cycles for the grid to at least try to be better than this. As I mentioned, I am not a good coder, so I am not sure what you are using for placements, but it looks to me like it should try to cycle through the valid directions.
the approach I am going to use: I determine how much diversity I want in my puzzle and will run the generation as long as it will take. f.e. if i am doing s+e, I want at least 45% to be S etc.
1 big grid - 'simple' filled, boring, orange easily could have been placed across 2 small grid - here it needs to fill like this because i said S+E 3 another example of only south 4 after seven more tries
Let me address the mask question...
If I am using mask (to create a rectangle instead of square), do I need to apply the mask after every generation? I would think that it is enough?
No, you don't need to reapply the mask after you call generate because the puzzle shape doesn't change and the puzzle generator takes the current mask(s) into account when placing words.
FYI, any mask(s) that is/are calculated (e.g. the built-in shapes) are automatically recalculated anytime the puzzle size is changed. For numerous reasons, a Rectangle
mask isn't autocalculated. So if you change the puzzle size the mask will remain however it was set. You can always change the rectangle size to fit the new puzzle size.
Just a note about validators...
Validators are built from a simple abstract base class so you can create your own very easily or use any of the pre-built validators included in the package. Just can provide a few or as many was you want to the puzzle object as a list.
class Validator(ABC):
"""Base class for the validation of words.
To implement your own `Validator`, subclass this class.
Example:
```python
class Palindrome(Validator):
def validate(self, value: str) -> bool:
return value == value[::-1]
"""
## Pre-built Validators
- NoSingleLetterWords
- NoPunctuation
- NoPalindromes
- NoSubwords
## Custom Validator Example
```python
class NoMoreOs(Validator):
"""A validator to ensure no words with the letter 'O' are valid."""
def validate(self, value: str, *args, **kwargs) -> bool:
return "o" not in value.lower()
Maybe somebody else reads this - I am currently consider using a format where I regenerate with size+1 if not all are placed after a few tries. That will not do for "I need 15x15" but in cases like this I can then start with 50 and go from there.
Increasing the size doesn't mean all words will be placed. On larger puzzles, size isn't usually the limiting factor. Typically it is word length, word count, and word placement. If have a bunch of really long words in a smaller puzzle. once a few words are placed the available positing for the other words is limited.
On your question about placement...
I did a simple test using your words and puzzle size from above.
from collections import defaultdict
from word_search_generator import WordSearch
# your long 118 word wordlist
words = words = "TELLURIUM,CHROMIUM,NICKEL..."
# custom word search puzzle to allow for larger sizes and wordlists
class ReallyBigWordSearch(WordSearch):
MAX_PUZZLE_SIZE = 1_000_000
MAX_PUZZLE_WORDS = 2_000
# create the puzzle
p = ReallyBigWordSearch(words, size=60, validators=None)
# ensure all words were placed
assert not p.unplaced_words
# see which directions were used
d = defaultdict(int)
for word in p.placed_words:
d[word.direction] += 1
After running that twice the results were as below...
# run #1
defaultdict(int,
{
<Direction.E: (0, 1)>: 39,
<Direction.SE: (1, 1)>: 27,
<Direction.NE: (-1, 1)>: 22,
<Direction.S: (1, 0)>: 30
}
)
# run #2
defaultdict(int,
{
<Direction.E: (0, 1)>: 28,
<Direction.S: (1, 0)>: 33,
<Direction.NE: (-1, 1)>: 23,
<Direction.SE: (1, 1)>: 34
}
)
This seems like a pretty good distribution considering the puzzle size and quantity of words. If no level is set for the puzzle it defaults to 2
which allows words to go NE, E, SW, and S. If a level of 1
is specified, then words will only go E and S.
When the generator runs, it first picks a random location on the board, sees if it is available, and then determines which of the available directions (set by the level) the word will fit. It then picks a random direction from all that were valid. Now, some other factors could force words to be mostly horizontal and vertical (N, E, S, or W). If you have a word that is 11 characters long and you have masked your puzzle to only be 10 wide by 15 tall, then the word will not fit at an angle (NE, SE, SW, or NW) as there is not enough room (which is the case in some of your example pics).
Using words almost as long as the puzzle size can also cause this issue. If the first word the generator places is a 10-character word in the middle of a 6-wide by 10-high puzzle unless that word shares lots of similar characters with the other words, most of the valid directions will only be horizontal and vertical.
In your second picture above, if you look carefully at the puzzle, after a few words have been placed it would be hard to fit any of the rest on an angle. This is one time when increasing the size would help.
The generator algorithm could be written to keep trying numerous directions for each word (and backtracking) to increase variability, but that would greatly slow down puzzle generation which I don't want to do. And who is to say what the correct variability is?
Hopefully, this all makes sense.
Maybe somebody else reads this - I am currently consider using a format where I regenerate with size+1 if not all are placed after a few tries. That will not do for "I need 15x15" but in cases like this I can then start with 50 and go from there.
Increasing the size doesn't mean all words will be placed. On larger puzzles, size isn't usually the limiting factor. Typically it is word length, word count, and word placement. If have a bunch of really long words in a smaller puzzle. once a few words are placed the available positing for the other words is limited.
Yes but i am keeping a running report on how many did fit. Lets say I am starting with a 15x15 and have 50 words - the ten tries I do in my loop will tell you 40-45 - that tells me "needs to be bigger". I want to start as small as possible but usually there is a size when you can say "this list with this grid will work well".
On your question about placement...
I did a simple test using your words and puzzle size from above.
thank you for the code that would have been another day for me with chatgpt help to get to those numbers - and it would not have been so elegant. And it is good to hear that it is as it should be (random). It also make more sense - your program is so nice it would have been weird if it not had been. Clearly it is a "it me not you" aka my wordlist which is why i am going to use your tuble thingy to regenerate until it is what I need / want it to be. By now I have it set up to do "here is my list, run it. Run it again, each time with 10x tries. Damn, only x will fit. Increase, Increase. Try again. "
I have not yet redone my code to fit your new "no validation" I assume a lot of my tries now will fit much nicer. ;)
Pre-built Validators
- NoSingleLetterWords
- NoPunctuation
- NoPalindromes
- NoSubwords
Yeah I am taking care of that in my workflow before the running of the code. And all of them are pretty much what I want in my list, but again, that is my concious decision whereas for a normal puzzle this is good.
Yes but i am keeping a running report on how many did fit. Lets say I am starting with a 15x15 and have 50 words - the ten tries I do in my loop will tell you 40-45 - that tells me "needs to be bigger". I want to start as small as possible but usually there is a size when you can say "this list with this grid will work well".
One way you could look at this is, a 15x15 puzzle has 225 available spaces. If every space was taken by word characters with an average word length of 6 (built-in wordlist average but not representative of your sample words), the puzzle would at most hold 37 words. Since that isn't really possible, I would take maybe 80% of that.
In testing this 100 times using random English words, the average placed word count for a 15x15 puzzle with a 50 word wordlist is 38.63.
from word_search_generator import WordSearch
from word_search_generator.utils import get_random_words
cts = []
for _ in range(100):
words = ",".join(get_random_words(50))
p = WordSearch(words, size=15)
cts.append(len(p.placed_words))
print(sum(cts)/100)
# 38.63
I am sharing this purely as fyi, i tried Josh's code but it did not work for me plus it did not have all i needed.
Why not go with "Joshs' random words shows ..."? Because while that is true in general it does not help for a specific word list.
tl;dir, my conclusion: Everything depends on your word list and your grid size (all bets are off if you use masks)
Also please do not forget what Josh said in the other comment "what determines a good puzzle is up to you" Greeting from Berlin Nicole
1) I build a script that will create puzzles and give stats on the direction
This code will take your word list and run 50 times and create an output on screen as well as into a file (tab delimited) e_placement test2.py.txt
The output shows the distribution per puzzle creation across the directions plus a summary normal vs diagonal. There is a loop to try for x times to have a puzzle first which has placed all the words.
2) I then build a script which will make a diagram out of it e_diagram3.py.txt reads the produced r_stats_placement.txt
Good and bad are defined as how many entries are inside the gold zone as defined by the parameter.
3) I ran this with my word list (grid 15, level 3, 26 words) for 5000 times r_stats_placement_5000.txt The list has two lines with 25 words, everything else is 26 words. Yes grid size 15 is a tight grid but it is a default size - and as you can see it can go into the good zone, but you need to check for it.
I ran this with validation off because it interferes with what I want to do. I repeated it for the list without that:
And because I was curios: grid size 50, the 118 elements, validators=None, 100 run r_stats_placement_element50.txt
grid size 40, the 118 elements, validators=None, 100 run r_stats_placement_40.txt
So more space means more even distribution, thus running the initial list on a size 40 grid 500x r_stats_placement_500ol.txt
So here are a few thoughts...
Yes, a larger size obviously allows more room for diagonal placement. What I meant in my earlier comment is that size isn't always the determining factor.
Possible diagonals will always be limited (compared to "regular") directions due to a few factors, including the puzzle having boundaries and word overlap. For example in the table below, no matter the random position chosen by the generator, there would only be a very small amount of words that could fit diagonally (overlap conflicts, boundary limitations) but there are numerous where another word could fit in a "normal" direction.
* * * * *
B A T * *
* T E S *
* * * E *
* * * T *
from collections import defaultdict
import pandas as pd
from word_search_generator import WordSearch
from word_search_generator.core.word import Direction
POSSIBLE_DIRECTIONS = [d.name for d in Direction]
def run_tests(
words, runs, puzzle_size, puzzle_level, max_tries
) -> list[tuple[dict, int]]:
results = []
for _ in range(runs):
counts: dict[str, int] = defaultdict(int)
p = WordSearch(words, size=puzzle_size, level=puzzle_level, validators=None)
tries = 1
while True:
if not p.unplaced_words or tries >= max_tries:
break
p.generate()
tries += 1
for word in p.placed_words:
if word.direction is None:
continue
counts[word.direction.name] += 1
results.append((counts, tries))
return results
if __name__ == "__main__":
# set defaults
words = open("words.txt").read()
runs = 5000
puzzle_size = 15
puzzle_level = 3
max_tries = 15
# run tests
test_results = run_tests(words, runs, puzzle_size, puzzle_level, max_tries)
# format data for pandas
data: dict[str, list[int]] = defaultdict(list)
for row in test_results:
counts, tries = row
data["tries"].append(tries)
for d in POSSIBLE_DIRECTIONS:
data[d].append(counts[d])
# load data
normal_dirs = ["N", "E", "S", "W"]
diagonal_dirs = ["NE", "SE", "SW", "NW"]
df = pd.DataFrame(data)
df["Placed"] = sum(df[d] for d in POSSIBLE_DIRECTIONS)
df["Normal %"] = (sum(df[d] for d in normal_dirs) / df["Placed"]) * 100
df["Diagonal %"] = (sum(df[d] for d in diagonal_dirs) / df["Placed"]) * 100
# present data
# print the entire table
# print(df.round(2))
# print only summary info
print(df.describe().round(2))
# save data
df.to_string("test_results.txt")
df.describe().round(2).to_string("tests_summary.txt")
# test results (truncated)
tries N NE E SE S SW W NW Placed Normal % Diagonal %
0 2 1 4 2 4 3 2 8 2 26 53.846154 46.153846
1 1 7 1 3 0 1 2 11 1 26 84.615385 15.384615
2 4 6 1 3 2 3 2 8 1 26 76.923077 23.076923
3 3 7 1 6 2 4 1 4 1 26 80.769231 19.230769
4 1 3 3 10 0 2 3 5 0 26 76.923077 23.076923
.. ... .. .. .. .. .. .. .. .. ... ... ...
4995 2 9 1 3 0 11 0 2 0 26 96.153846 3.846154
4996 1 11 0 3 2 7 0 3 0 26 92.307692 7.692308
4997 1 11 7 1 0 5 1 1 0 26 69.230769 30.769231
4998 1 7 2 9 0 0 0 7 1 26 88.461538 11.538462
4999 1 0 1 13 1 2 0 7 2 26 84.615385 15.384615
# tests summary stats
tries N NE E SE S SW W NW Placed Normal % Diagonal %
count 5000.00 5000.00 5000.00 5000.00 5000.00 5000.00 5000.00 5000.00 5000.00 5000.0 5000.00 5000.00
mean 2.18 5.03 1.51 5.05 1.46 5.01 1.50 4.97 1.46 26.0 77.16 22.84
std 1.59 2.62 1.68 2.57 1.62 2.59 1.67 2.54 1.61 0.0 14.10 14.10
min 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 26.0 23.08 0.00
25% 1.00 3.00 0.00 3.00 0.00 3.00 0.00 3.00 0.00 26.0 65.38 11.54
50% 2.00 5.00 1.00 5.00 1.00 5.00 1.00 5.00 1.00 26.0 76.92 23.08
75% 3.00 7.00 2.00 7.00 2.00 7.00 2.00 7.00 2.00 26.0 88.46 34.62
max 14.00 17.00 11.00 15.00 12.00 16.00 11.00 15.00 11.00 26.0 100.00 76.92
Really, no matter how many tests I ran with your defaults, the results ranged between 72-78% "normal" and 28-22% "diagonal".
For a test of 500 puzzles (without any retries), the generator tried 12,941 random positions (25.88 per puzzle). On average there were 1.76 valid directions per random position with 70% of those being "normal" directions and 30% being "diagonal" directions.
So, no matter the size of the puzzle, you will always have a limited number of "diagonal" direction words in comparison to "normal" direction words.
Total Word Placement Attempts: 12941
Average Valid Directions per Random Position 1.76
Average Valid NORMAL Directions per Random Position 1.22
Average Valid NORMAL Directions per Random Position 0.53
Generator
abstract base class. Your custom generator could try "diagonals" first, instead of picking a direction at random. Please note, forcing "diagonals" take many more tries and processing. As you can see from the stats below, forcing "diagonals" uses all available retries and most often doesn't place all words.# test results (truncated), using level=7 "diagonals" only
tries N NE E SE S SW W NW Placed Normal % Diagonal %
0 15 0 4 0 4 0 9 0 5 22 0.0 100.0
1 15 0 6 0 11 0 1 0 6 24 0.0 100.0
2 15 0 8 0 3 0 8 0 4 23 0.0 100.0
3 15 0 1 0 9 0 3 0 8 21 0.0 100.0
4 15 0 8 0 1 0 12 0 4 25 0.0 100.0
.. ... .. .. .. .. .. .. .. .. ... ... ...
495 15 0 6 0 2 0 4 0 10 22 0.0 100.0
496 15 0 14 0 1 0 9 0 0 24 0.0 100.0
497 15 0 5 0 6 0 4 0 7 22 0.0 100.0
498 15 0 7 0 4 0 10 0 2 23 0.0 100.0
499 15 0 9 0 7 0 2 0 6 24 0.0 100.0
# tests summary stats
tries N NE E SE S SW W NW Placed Normal % Diagonal %
count 500.00 500.0 500.00 500.0 500.00 500.0 500.00 500.0 500.00 500.00 500.0 500.0
mean 14.31 0.0 5.96 0.0 5.74 0.0 5.94 0.0 5.89 23.54 0.0 100.0
std 2.44 0.0 3.34 0.0 3.18 0.0 3.36 0.0 3.33 1.15 0.0 0.0
min 1.00 0.0 0.00 0.0 0.00 0.0 0.00 0.0 0.00 21.00 0.0 100.0
25% 15.00 0.0 3.00 0.0 3.00 0.0 3.00 0.0 3.00 23.00 0.0 100.0
50% 15.00 0.0 6.00 0.0 6.00 0.0 6.00 0.0 6.00 23.00 0.0 100.0
75% 15.00 0.0 8.00 0.0 8.00 0.0 8.00 0.0 8.00 24.00 0.0 100.0
max 15.00 0.0 21.00 0.0 17.00 0.0 17.00 0.0 15.00 26.00 0.0 100.0
Hi Josh, hope you can help. I have run into a weird problem
I have this word list (and yes, I adapted the max amount of word size, why is that placed anyway?) to get all chemical elements.
But no matter how much I try to place stuff it always leaves out things. I have run this list probably a few dozen times, both with generate new and also manually trying to add new words)
You can see the image where i marked in red where there was enough space 116/118 is the highest I have come so far. I have tried other lists with a similar result - the list is not used fully despite retries and more than enough space available.
sizegrid = 60 Word List: TELLURIUM,CHROMIUM,NICKEL,NIHONIUM,ANTIMONY,LANTHANUM,KRYPTON,NEON,BROMINE,THORIUM,OXYGEN,CHLORINE,HYDROGEN,MENDELEVIUM,PLUTONIUM,NEPTUNIUM,NITROGEN,DARMSTADTIUM,TENNESSINE,BARIUM,GERMANIUM,POTASSIUM,CALIFORNIUM,HASSIUM,MEITNERIUM,RHENIUM,INDIUM,PROMETHIUM,DUBNIUM,VANADIUM,ARSENIC,THALLIUM,NIOBIUM,MOSCOVIUM,TUNGSTEN,XENON,RUTHENIUM,SAMARIUM,MAGNESIUM,OSMIUM,STRONTIUM,RUTHERFORDIUM,NOBELIUM,COPPER,LIVERMORIUM,ASTATINE,TITANIUM,IRIDIUM,ARGON,LAWRENCIUM,BERYLLIUM,ERBIUM,AMERICIUM,CARBON,CERIUM,LEAD,FERMIUM,SILVER,FLEROVIUM,GADOLINIUM,ROENTGENIUM,GALLIUM,SEABORGIUM,EINSTEINIUM,CALCIUM,COPERNICIUM,FRANCIUM,ACTINIUM,PALLADIUM,PLATINUM,SCANDIUM,RADON,HAFNIUM,TANTALUM,MANGANESE,BERKELIUM,HOLMIUM,DYSPROSIUM,BORON,GOLD,LITHIUM,ALUMINIUM,MOLYBDENUM,CAESIUM,SELENIUM,COBALT,NEODYMIUM,SILICON,ZIRCONIUM,RHODIUM,LUTETIUM,CADMIUM,CURIUM,IODINE,YTTRIUM,SODIUM,BISMUTH,TERBIUM,TECHNETIUM,URANIUM,PRASEODYMIUM,IRON,PHOSPHORUS,SULFUR,OGANESSON,ZINC,RUBIDIUM,EUROPIUM,RADIUM,FLUORINE,HELIUM,TIN,POLONIUM,THULIUM,BOHRIUM,MERCURY
Unplaced:___ PROTACTINIUM,YTTERBIUM
Am I doing something wrong?