shiryu92kmp / ouspg

Automatically exported from code.google.com/p/ouspg
1 stars 0 forks source link

Blab repeats itsself #99

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Which tool: blab 0.2a

What steps will reproduce the problem?
1. blab -s 3 -e '(([bcdfghjklmnpqrstvwx][aeiouy]{1,2}){1,4} 32){10} 10'

What is the expected output? 

The output in not random. I see a lot of repetitions. In blab 0.1.4 these 
repetitions did not occur:

$ blab -s 3 -e '(([bcdfghjklmnpqrstvwxz][aeiouy]{1,2}){1,4} 32){10} 10'

xakyoruzya jycy xiquecevya keabeufydio hyyjejue rexie fajelefeu zigi mynii 
kibidoe 

What do you see instead?

vui titisuvui
$blab -s 3 -e '(([bcdfghjklmnpqrstvwx][aeiouy]{1,2}){1,4} 32){10} 10'
su titisuvui su titisuvui su titisuvui su titisuvui 

What version of the product are you using? On what operating system?
0.2a 
OS X Yosemite

Please provide any additional information below.
I have a different blab version installed on two different systems. I wonder - 
is this an undocumented feature or a bug?

Original issue reported on code.google.com by richa...@gapps.semantico.com on 6 Jan 2015 at 3:12

GoogleCodeExporter commented 9 years ago
This is actually a feature. If Blab generates productions purely randomly, the 
outputs would in effect never contain certain unlikely combinations, like deep 
chains of one or more nested rules, etc. To work around this, a weak random 
generator is used for some outputs.

For example, it's easy to see here which ones aren't very randomly generated 
here:

$ blab --seed 42 -e '[a-z ]{40} 10' -n 30
ftmnnyrxhludqoyhjubkfmoahwoicrwhkrswkrva
ejotychmsxbglqv ekpuzdinsxchmrwafkqv ejo
 mujihsjzqbxcotrc dpdnowstmqufgvixnvdykb
arautddoxtwkzmjodgoscqphjodem ulcmeuvavk
uwt csxa zgpoad ycruqqucenphwqutppnad lz
ubiqyfmubjqyfnubjqyfnubjryfnvbjrzfnvbjrz
ipkdx hahfktu bdikpijvgngsqptsflfgioprnk
wklgszfuzspcunawhcztktunugpoudgirjfnkhtm
nxswtvvbnurgbwrykujbqpcmhihqmwjnkqzyveud
qvoqzcdtgwfcoqksfsjonyvhfcvwvobzyyakrlvo
xsib lheyipvepzwyzbforw ebsqsb  gjxfcuzn
hecazwusqnljhfcazxusqomjhfdazxvtqomkhfdb
xplx koejnwgtaklvexasaxesdsqhcdmkzhu tyf
as cfoxirhcqxrgcityjojc lruy subylienkjp
bdfikmortvxzbdfhkmoqsvxzadfhjloqsuxzaceh
irhcqxrgcityjojc lruy subylienkjpifaclce
ifrqrcajz ifrqrcajz ifrqrcajz ifrqrcajz
cajzwapgzjuqwgdzpqmrcajzwapgzjuqwgdzpqmr
jzwapgzjuqwgdzpqmoaehh isucpwqxrtd keduh
pclhopgshofnkwewlkmdiek philnuq tgosnteb
vvhmzisoroawhvpfljbvnjhpgae xbczuzxvcohc
xxidybebbmspjlrxxidybebbmspjlrxxidybebbm
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
jld aoaiakmrmyyqeqclfsxijnlnwsrnmujkobzw
owxgugbwrsdrohmpnjvktfceicegapjtnuimsuxe
fykkyffxikkyhq fxtnijt jjersewxhgxnovzlg
ixclwontdz hsypixkixclwontdz hsypixkixcl
clwontdz hsypixluwekjhnzyyngijveljzssobu
nvpesvc vgzterey rfjlsorjpvauibiwpkvhgsw

If this feature is causing trouble in some use case, a flag can be added to 
disable it. Is this the case?

Original comment by aohelin on 6 Jan 2015 at 4:34

GoogleCodeExporter commented 9 years ago
Hi there,

Thanks for the prompt reply!

This feature causes problems when fuzzing XML ID attributes. While (arguably) 
good test data would include duplicate IDs, in practice they come up too 
frequently. A way to disable would therefore be nice - version 0.1.4 did not 
exhibit this behaviour.

Best wishes
Richard

Original comment by richa...@gapps.semantico.com on 6 Jan 2015 at 5:47

GoogleCodeExporter commented 9 years ago
You can now specify -r generator[=weight],... in trunk, and get 0.1.4 plain 
random distribution with -r rand. Asking explicitly for just "rand" will be 
necessary from now on, because usually repetitions are also interesting. The 
default is currently -r rand=10,loop=2,step.

$ blab -e '[ox ]{100} 10' -n 10000 | grep xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | 
wc -l
79
$ blab -e '[ox ]{100} 10' -n 10000 -r rand | grep 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | wc -l
0

Original comment by aohelin on 7 Jan 2015 at 11:31

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Brilliant new feature. You're a star. A kleene one :D.

Original comment by richa...@gapps.semantico.com on 8 Jan 2015 at 10:48