Closed GoogleCodeExporter closed 9 years ago
This is actually a feature. If Blab generates productions purely randomly, the
outputs would in effect never contain certain unlikely combinations, like deep
chains of one or more nested rules, etc. To work around this, a weak random
generator is used for some outputs.
For example, it's easy to see here which ones aren't very randomly generated
here:
$ blab --seed 42 -e '[a-z ]{40} 10' -n 30
ftmnnyrxhludqoyhjubkfmoahwoicrwhkrswkrva
ejotychmsxbglqv ekpuzdinsxchmrwafkqv ejo
mujihsjzqbxcotrc dpdnowstmqufgvixnvdykb
arautddoxtwkzmjodgoscqphjodem ulcmeuvavk
uwt csxa zgpoad ycruqqucenphwqutppnad lz
ubiqyfmubjqyfnubjqyfnubjryfnvbjrzfnvbjrz
ipkdx hahfktu bdikpijvgngsqptsflfgioprnk
wklgszfuzspcunawhcztktunugpoudgirjfnkhtm
nxswtvvbnurgbwrykujbqpcmhihqmwjnkqzyveud
qvoqzcdtgwfcoqksfsjonyvhfcvwvobzyyakrlvo
xsib lheyipvepzwyzbforw ebsqsb gjxfcuzn
hecazwusqnljhfcazxusqomjhfdazxvtqomkhfdb
xplx koejnwgtaklvexasaxesdsqhcdmkzhu tyf
as cfoxirhcqxrgcityjojc lruy subylienkjp
bdfikmortvxzbdfhkmoqsvxzadfhjloqsuxzaceh
irhcqxrgcityjojc lruy subylienkjpifaclce
ifrqrcajz ifrqrcajz ifrqrcajz ifrqrcajz
cajzwapgzjuqwgdzpqmrcajzwapgzjuqwgdzpqmr
jzwapgzjuqwgdzpqmoaehh isucpwqxrtd keduh
pclhopgshofnkwewlkmdiek philnuq tgosnteb
vvhmzisoroawhvpfljbvnjhpgae xbczuzxvcohc
xxidybebbmspjlrxxidybebbmspjlrxxidybebbm
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
jld aoaiakmrmyyqeqclfsxijnlnwsrnmujkobzw
owxgugbwrsdrohmpnjvktfceicegapjtnuimsuxe
fykkyffxikkyhq fxtnijt jjersewxhgxnovzlg
ixclwontdz hsypixkixclwontdz hsypixkixcl
clwontdz hsypixluwekjhnzyyngijveljzssobu
nvpesvc vgzterey rfjlsorjpvauibiwpkvhgsw
If this feature is causing trouble in some use case, a flag can be added to
disable it. Is this the case?
Original comment by aohelin
on 6 Jan 2015 at 4:34
Hi there,
Thanks for the prompt reply!
This feature causes problems when fuzzing XML ID attributes. While (arguably)
good test data would include duplicate IDs, in practice they come up too
frequently. A way to disable would therefore be nice - version 0.1.4 did not
exhibit this behaviour.
Best wishes
Richard
Original comment by richa...@gapps.semantico.com
on 6 Jan 2015 at 5:47
You can now specify -r generator[=weight],... in trunk, and get 0.1.4 plain
random distribution with -r rand. Asking explicitly for just "rand" will be
necessary from now on, because usually repetitions are also interesting. The
default is currently -r rand=10,loop=2,step.
$ blab -e '[ox ]{100} 10' -n 10000 | grep xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
wc -l
79
$ blab -e '[ox ]{100} 10' -n 10000 -r rand | grep
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | wc -l
0
Original comment by aohelin
on 7 Jan 2015 at 11:31
[deleted comment]
Brilliant new feature. You're a star. A kleene one :D.
Original comment by richa...@gapps.semantico.com
on 8 Jan 2015 at 10:48
Original issue reported on code.google.com by
richa...@gapps.semantico.com
on 6 Jan 2015 at 3:12