Open Zac-HD opened 2 years ago
In fact I'd even recommend using Hypothesis to generate strings directly, instead of using real words - it looks less like production data, but is far more likely to find bugs.
Indeed, it did find more bugs, and they were all upstream :) At the time of writing this, I didn't have the capacity to address all of the upstream issues, so I resorted to something that more closely resembled what upstream was expecting.
I agree with your remarks in regards to the external dependency in the random word generator. I was probably moving fast and didn't stop to really notice it. I'll switch over to just using some static local data to kill off that dependency.
def word() -> s.SearchStrategy[str]:
return st.text(alphabet=string.ascii_letters, min_size=1, max_size=15)
def words(...):
return st.text(alphabet=string.printable, min_size=1, max_size=200)
of course "words" isn't exactly the right name for the latter anymore, but printable ASCII characters should work upstream! (if not, I'd just keep trying smaller alphabets, and open an upstream issue)
Oh, another case for getting rid of random
: https://github.com/jmgilman/beancount-hypothesis/blob/002b8b895de40764a852cdcf57a66a9ed2f67e1b/beancount_hypothesis/directive.py#L25-L35
def meta() -> dict[str, Union[str, int]:
return st.fixed_dictionaries(
{"filename": common.filename(), "lineno": st.integers(1, 100000)}
)
It turns out that the
random_word
package fetches each word individually from https://developer.wordnik.com/ - this is totally unsuitable for a testing library, since it makes tests flaky and needlessly dependent on external infrastructure.https://github.com/jmgilman/beancount-hypothesis/blob/e6501171f2e1b0e879f67fd1812f6da45d465d39/beancount_hypothesis/common.py#L40-L41
See here and here for the upstream implementation.
Instead, you should use Hypothesis (not
random
- see #1) to choose words from a list which is statically defined in beancount-hypothesis, whether in Python or a data file (Hypothesis does this for top-level domains, for example). In fact I'd even recommend using Hypothesis to generate strings directly, instead of using real words - it looks less like production data, but is far more likely to find bugs.