jmgilman / beancount-hypothesis

A package which provides hypothesis strategies for generating beancount types
MIT License
0 stars 1 forks source link

Random words are individually fetched from a web API ?!?! #2

Open Zac-HD opened 2 years ago

Zac-HD commented 2 years ago

It turns out that the random_word package fetches each word individually from https://developer.wordnik.com/ - this is totally unsuitable for a testing library, since it makes tests flaky and needlessly dependent on external infrastructure.

https://github.com/jmgilman/beancount-hypothesis/blob/e6501171f2e1b0e879f67fd1812f6da45d465d39/beancount_hypothesis/common.py#L40-L41

See here and here for the upstream implementation.

Instead, you should use Hypothesis (not random - see #1) to choose words from a list which is statically defined in beancount-hypothesis, whether in Python or a data file (Hypothesis does this for top-level domains, for example). In fact I'd even recommend using Hypothesis to generate strings directly, instead of using real words - it looks less like production data, but is far more likely to find bugs.

jmgilman commented 2 years ago

In fact I'd even recommend using Hypothesis to generate strings directly, instead of using real words - it looks less like production data, but is far more likely to find bugs.

Indeed, it did find more bugs, and they were all upstream :) At the time of writing this, I didn't have the capacity to address all of the upstream issues, so I resorted to something that more closely resembled what upstream was expecting.

I agree with your remarks in regards to the external dependency in the random word generator. I was probably moving fast and didn't stop to really notice it. I'll switch over to just using some static local data to kill off that dependency.

Zac-HD commented 2 years ago
def word() -> s.SearchStrategy[str]:
    return st.text(alphabet=string.ascii_letters, min_size=1, max_size=15)

def words(...):
    return st.text(alphabet=string.printable, min_size=1, max_size=200)

of course "words" isn't exactly the right name for the latter anymore, but printable ASCII characters should work upstream! (if not, I'd just keep trying smaller alphabets, and open an upstream issue)

Zac-HD commented 2 years ago

Oh, another case for getting rid of random: https://github.com/jmgilman/beancount-hypothesis/blob/002b8b895de40764a852cdcf57a66a9ed2f67e1b/beancount_hypothesis/directive.py#L25-L35

def meta() -> dict[str, Union[str, int]:
    return st.fixed_dictionaries(
        {"filename": common.filename(), "lineno": st.integers(1, 100000)}
    )