sageserpent-open / americium

Generation of test case data for Scala and Java, in the spirit of QuickCheck. When your test fails, it gives you a minimised failing test case and a way of reproducing the failure immediately.
MIT License
15 stars 1 forks source link

Add primitive `Trials` factory methods to `TrialsApi`. #5

Closed sageserpent-open closed 3 years ago

sageserpent-open commented 3 years ago

For both the Java and Scala flavours of TrialsApi, there should be more factory methods for Trials<Character>/Trials[Char], Trials<String>, Trials<Instant> and so on.

The situation for strings is nuanced - we could have a one-stop-shop that churns out 'generally useful strings' (Whatever they are - gibberish in US ASCII? Choice phrases of the Bhagavad Gita in Devanagari? A dinner menu in Standard Chinese? The collection of Jeeves and Wooster books?).

An alternative is to provide some factory methods for Trials<Character>/Trials[Char], so we could have various flavours of what constitutes a letter, assuming that plays well for ideographic languages or Devanagari - then we supply a factory that takes a Trials<Character>/Trials[Char] and yields a Trials<Character>, this is what VavrTest does - it is up to the end user to specify the relevant character set - either using the canned factory methods or rolling their own or using monadic combination to customise the mix. This seems to be the safest and most principled approach.

That said, there is no harm in supplying some canned Trials<String> such as number strings, identifiers, some kind of context free grammar or regular expression driven things, some stock dictionaries ....

sageserpent-open commented 3 years ago

This is looking increasingly complicated - in my ignorance of typing Devanagari, I'm now unsure as to whether consonants and following vowel modifiers are encoded phonetically: 'aap kaise hain', sort-of, or as a graphical breakdown with the core consonant decorated with the various modifier bits. Don't ask me about any Chinese languages, or Japanese, or Korean, or Ethiopian....

Perhaps a finite state automaton / Markov chain is the way to go - if one is supplied, then we can furnish a Trials<Character>. The idea here is to encode a reasonably sensible set of rules for generating syllables using the right encoding for a language - leave it to the experts to contribute a decent model.

The alternative is to find some dictionary of words and names and select random entries from it.

sageserpent-open commented 3 years ago

Found this: https://gist.github.com/deekayen/4148741 - it's a start.

sageserpent-open commented 3 years ago

Thinking again, perhaps there is a problem of semantics here: a string is essentially an immutable sequence of characters in Java - any additional semantics are domain specific, and probably best represented by a stricter abstraction - so we could have a sentence abstraction, or perhaps an abstract syntax tree, or simply use a more principled representation of the domain values denoted by strings.

All of this implies that we may as well just churn out random sequences of Unicode characters and hope for the best. If we want to generate strings that conform to a domain, it would be better to generate a trials of the domain type and then map the values to strings.