Open funderburkjim opened 7 years ago
When I entered BakzaMkAra
all I got was Loading...
When just sankara
720 variants - wow, quite a few.
zaMkara 200 51
saMkara 200 36
saMkAra 200 1
zaMkarA 200 0
zAMkara 200 0
Sankara 404 -1
SankarA 404 -1
SankaRa 404 -1
zAGkhARRA 404 -1
zAGkhARa 404 -1
zAGkhARRa 404 -1
SankaRA 404 -1
zAGkhARA 404 -1
SankaRRA 404 -1
SankARRA 404 -1
Sankhara 404 -1
SankharA 404 -1
SankhaRa 404 -1
SankARRa 404 -1
SankARA 404 -1
zAGkhArA 404 -1
SankAra 404 -1
SankArA 404 -1
SankARa 404 -1
SankaRRa 404 -1
zAGkhaRRA 404 -1
zAGkaRA 404 -1
zAGkaRRa 404 -1
zAGkaRRA 404 -1
zAGkAra 404 -1
zAGkaRa 404 -1
zAGkarA 404 -1
zAJkhARRa 404 -1
zAJkhARRA 404 -1
zAGkara 404 -1
zAGkArA 404 -1
zAGkARa 404 -1
zAGkhaRa 404 -1
zAGkhaRA 404 -1
zAGkhaRRa 404 -1
SankhaRA 404 -1
zAGkharA 404 -1
zAGkhara 404 -1
zAGkARA 404 -1
zAGkARRa 404 -1
zAGkARRA 404 -1
zAGkhAra 404 -1
SankhAra 404 -1
SaMkhARA 404 -1
SaMkhARRa 404 -1
SaMkhARRA 404 -1
SaNkara 404 -1
SaMkhARa 404 -1
SaMkhArA 404 -1
SaMkhaRA 404 -1
SaMkhaRRa 404 -1
SaMkhaRRA 404 -1
SaMkhAra 404 -1
SaNkarA 404 -1
SaNkaRa 404 -1
SaNkARa 404 -1
SaNkARA 404 -1
SaNkARRa 404 -1
SaNkARRA 404 -1
SaNkArA 404 -1
SaNkAra 404 -1
SaNkaRA 404 -1
SaNkaRRa 404 -1
SaNkaRRA 404 -1
SaMkhaRa 404 -1
SaMkharA 404 -1
SankhARRa 404 -1
SankhARRA 404 -1
SaMkara 404 -1
SaMkarA 404 -1
SankhARA 404 -1
SankhARa 404 -1
SankhaRRA 404 -1
zAJkhARA 404 -1
SankhArA 404 -1
SaMkaRa 404 -1
SaMkaRA 404 -1
SaMkARA 404 -1
SaMkARRa 404 -1
SaMkARRA 404 -1
SaMkhara 404 -1
SaMkARa 404 -1
SaMkArA 404 -1
SaMkaRRa 404 -1
SaMkaRRA 404 -1
SaMkAra 404 -1
SankhaRRa 404 -1
zAJkhAra 404 -1
zAMkARa 404 -1
zAMkARA 404 -1
zAMkARRa 404 -1
zAMkARRA 404 -1
zAMkArA 404 -1
zAMkAra 404 -1
zAMkaRa 404 -1
zAMkaRA 404 -1
zAMkaRRa 404 -1
zAMkaRRA 404 -1
zAMkhara 404 -1
zAMkharA 404 -1
zAMkhArA 404 -1
zAMkhARa 404 -1
zAMkhARA 404 -1
zAMkhARRa 404 -1
zAMkhAra 404 -1
zAMkhaRRA 404 -1
zAMkhaRa 404 -1
zAMkhaRA 404 -1
zAMkhaRRa 404 -1
zAMkarA 404 -1
zAnkhARRA 404 -1
zAnkARa 404 -1
zAnkARA 404 -1
zAnkARRa 404 -1
zAnkARRA 404 -1
zAnkArA 404 -1
zAnkAra 404 -1
zAnkaRA 404 -1
zAnkaRRa 404 -1
zAnkaRRA 404 -1
zAnkhara 404 -1
zAnkharA 404 -1
zAnkhArA 404 -1
zAnkhARa 404 -1
zAnkhARA 404 -1
zAnkhARRa 404 -1
zAnkhAra 404 -1
zAnkhaRRA 404 -1
zAnkhaRa 404 -1
zAnkhaRA 404 -1
zAnkhaRRa 404 -1
zAMkhARRA 404 -1
zANkara 404 -1
zAJkaRRA 404 -1
zAJkAra 404 -1
zAJkArA 404 -1
zAJkARa 404 -1
zAJkaRRa 404 -1
zAJkaRA 404 -1
zANkhARRA 404 -1
zAJkara 404 -1
zAJkarA 404 -1
zAJkaRa 404 -1
zAJkARA 404 -1
zAJkARRa 404 -1
zAJkhaRRa 404 -1
zAJkhaRRA 404 -1
SaNkhara 404 -1
zAJkhArA 404 -1
zAJkhaRA 404 -1
zAJkhaRa 404 -1
zAJkARRA 404 -1
zAJkhara 404 -1
zAJkharA 404 -1
zANkhARRa 404 -1
zANkhARA 404 -1
zANkAra 404 -1
zANkArA 404 -1
zANkARa 404 -1
zANkARA 404 -1
zANkaRRA 404 -1
zANkaRRa 404 -1
zANkarA 404 -1
zANkaRa 404 -1
zANkaRA 404 -1
zANkARRa 404 -1
zANkARRA 404 -1
zANkhaRRA 404 -1
zANkhAra 404 -1
zANkhArA 404 -1
zANkhARa 404 -1
zANkhaRRa 404 -1
zANkhaRA 404 -1
zANkhara 404 -1
zANkharA 404 -1
zANkhaRa 404 -1
zAJkhARa 404 -1
SaNkhaRa 404 -1
SANkhara 404 -1
SANkharA 404 -1
SANkhaRa 404 -1
SANkhaRA 404 -1
SANkARRA 404 -1
SANkARRa 404 -1
SANkAra 404 -1
SANkArA 404 -1
SANkARa 404 -1
SANkARA 404 -1
SANkhaRRa 404 -1
SANkhaRRA 404 -1
SANkhARRA 404 -1
SAJkara 404 -1
SAJkarA 404 -1
SAJkaRa 404 -1
SANkhARRa 404 -1
SANkhARA 404 -1
SANkhAra 404 -1
SANkhArA 404 -1
SANkhARa 404 -1
SANkaRRA 404 -1
SANkaRRa 404 -1
SAMkharA 404 -1
SAMkhaRa 404 -1
SAMkhaRA 404 -1
SAMkhaRRa 404 -1
SAMkhara 404 -1
SAMkARRA 404 -1
SAMkARa 404 -1
SAMkARA 404 -1
SAMkARRa 404 -1
SAMkhaRRA 404 -1
SAMkhAra 404 -1
SANkara 404 -1
SANkarA 404 -1
SANkaRa 404 -1
SANkaRA 404 -1
SAMkhARRA 404 -1
SAMkhARRa 404 -1
SAMkhArA 404 -1
SAMkhARa 404 -1
SAMkhARA 404 -1
SAJkaRA 404 -1
SAJkaRRa 404 -1
SAGkARA 404 -1
SAGkARRa 404 -1
SAGkARRA 404 -1
SAGkhara 404 -1
SAGkARa 404 -1
SAGkArA 404 -1
SAGkaRA 404 -1
SAGkaRRa 404 -1
SAGkaRRA 404 -1
SAGkAra 404 -1
SAGkharA 404 -1
SAGkhaRa 404 -1
SAGkhARa 404 -1
SAGkhARA 404 -1
SAGkhARRa 404 -1
SAGkhARRA 404 -1
SAGkhArA 404 -1
SAGkhAra 404 -1
SAGkhaRA 404 -1
SAGkhaRRa 404 -1
SAGkhaRRA 404 -1
SAGkaRa 404 -1
SAGkarA 404 -1
SAJkARRa 404 -1
SAJkARRA 404 -1
SAJkhara 404 -1
SAJkharA 404 -1
SAJkARA 404 -1
SAJkARa 404 -1
SAJkaRRA 404 -1
SAJkAra 404 -1
SAJkArA 404 -1
SAJkhaRa 404 -1
SAJkhaRA 404 -1
SAJkhARA 404 -1
SAJkhARRa 404 -1
SAJkhARRA 404 -1
SAGkara 404 -1
SAJkhARa 404 -1
SAJkhArA 404 -1
SAJkhaRRa 404 -1
SAJkhaRRA 404 -1
SAJkhAra 404 -1
SAMkArA 404 -1
SAMkAra 404 -1
SaJkhAra 404 -1
SaJkhArA 404 -1
SaJkhARa 404 -1
SaJkhARA 404 -1
SaJkhaRRA 404 -1
SaJkhaRRa 404 -1
SaJkhara 404 -1
SaJkharA 404 -1
SaJkhaRa 404 -1
SaJkhaRA 404 -1
SaJkhARRa 404 -1
SaJkhARRA 404 -1
SaGkaRRA 404 -1
SaGkAra 404 -1
SaGkArA 404 -1
SaGkARa 404 -1
SaGkaRRa 404 -1
SaGkaRA 404 -1
SaGkara 404 -1
SaGkarA 404 -1
SaGkaRa 404 -1
SaJkARRA 404 -1
SaJkARRa 404 -1
SaNkhArA 404 -1
SaNkhARa 404 -1
SaNkhARA 404 -1
SaNkhARRa 404 -1
SaNkhAra 404 -1
SaNkhaRRA 404 -1
zAnkaRa 404 -1
SaNkhaRA 404 -1
SaNkhaRRa 404 -1
SaNkhARRA 404 -1
SaJkara 404 -1
SaJkAra 404 -1
SaJkArA 404 -1
SaJkARa 404 -1
SaJkARA 404 -1
SaJkaRRA 404 -1
SaJkaRRa 404 -1
SaJkarA 404 -1
SaJkaRa 404 -1
SaJkaRA 404 -1
SaGkARA 404 -1
SaGkARRa 404 -1
SAnkhaRA 404 -1
SAnkhaRRa 404 -1
SAnkhaRRA 404 -1
SAnkhAra 404 -1
SAnkhaRa 404 -1
SAnkharA 404 -1
SAnkARA 404 -1
SAnkARRa 404 -1
SAnkARRA 404 -1
SAnkhara 404 -1
SAnkhArA 404 -1
SAnkhARa 404 -1
SAMkaRa 404 -1
SAMkaRA 404 -1
SAMkaRRa 404 -1
SAMkaRRA 404 -1
SAMkarA 404 -1
SAMkara 404 -1
SAnkhARA 404 -1
SAnkhARRa 404 -1
SAnkhARRA 404 -1
SAnkARa 404 -1
SAnkArA 404 -1
SaGkhaRRa 404 -1
SaGkhaRRA 404 -1
SaGkhAra 404 -1
SaGkhArA 404 -1
SaGkhaRA 404 -1
SaGkhaRa 404 -1
SaGkARRA 404 -1
SaGkhara 404 -1
SaGkharA 404 -1
SaGkhARa 404 -1
SaGkhARA 404 -1
SAnkaRA 404 -1
SAnkaRRa 404 -1
SAnkaRRA 404 -1
SAnkAra 404 -1
SAnkaRa 404 -1
SAnkarA 404 -1
SaGkhARRa 404 -1
SaGkhARRA 404 -1
SAnkara 404 -1
SaNkharA 404 -1
zaGkhARRA 404 -1
sAnkara 404 -1
sAnkarA 404 -1
sAnkaRa 404 -1
sAnkaRA 404 -1
saGkhARRA 404 -1
saGkhARRa 404 -1
saGkhAra 404 -1
saGkhArA 404 -1
saGkhARa 404 -1
saGkhARA 404 -1
sAnkaRRa 404 -1
sAnkaRRA 404 -1
sAnkARRA 404 -1
sAnkhara 404 -1
sAnkharA 404 -1
sAnkhaRa 404 -1
sAnkARRa 404 -1
sAnkARA 404 -1
sAnkAra 404 -1
sAnkArA 404 -1
sAnkARa 404 -1
saGkhaRRA 404 -1
saGkhaRRa 404 -1
saGkarA 404 -1
saGkaRa 404 -1
saGkaRA 404 -1
saGkaRRa 404 -1
saGkara 404 -1
saJkhARRA 404 -1
saJkhARa 404 -1
saJkhARA 404 -1
saJkhARRa 404 -1
saGkaRRA 404 -1
saGkAra 404 -1
saGkhara 404 -1
saGkharA 404 -1
saGkhaRa 404 -1
saGkhaRA 404 -1
saGkARRA 404 -1
saGkARRa 404 -1
saGkArA 404 -1
saGkARa 404 -1
saGkARA 404 -1
sAnkhaRA 404 -1
sAnkhaRRa 404 -1
sAMkhARA 404 -1
sAMkhARRa 404 -1
sAMkhARRA 404 -1
sANkara 404 -1
sAMkhARa 404 -1
sAMkhArA 404 -1
sAMkhaRA 404 -1
sAMkhaRRa 404 -1
sAMkhaRRA 404 -1
sAMkhAra 404 -1
sANkarA 404 -1
sANkaRa 404 -1
sANkARa 404 -1
sANkARA 404 -1
sANkARRa 404 -1
sANkARRA 404 -1
sANkArA 404 -1
sANkAra 404 -1
sANkaRA 404 -1
sANkaRRa 404 -1
sANkaRRA 404 -1
sAMkhaRa 404 -1
sAMkharA 404 -1
sAnkhARRa 404 -1
sAnkhARRA 404 -1
sAMkara 404 -1
sAMkarA 404 -1
sAnkhARA 404 -1
sAnkhARa 404 -1
sAnkhaRRA 404 -1
sAnkhAra 404 -1
sAnkhArA 404 -1
sAMkaRa 404 -1
sAMkaRA 404 -1
sAMkARA 404 -1
sAMkARRa 404 -1
sAMkARRA 404 -1
sAMkhara 404 -1
sAMkARa 404 -1
sAMkArA 404 -1
sAMkaRRa 404 -1
sAMkaRRA 404 -1
sAMkAra 404 -1
saJkhArA 404 -1
saJkhAra 404 -1
saMkArA 404 -1
saMkARa 404 -1
saMkARA 404 -1
saMkARRa 404 -1
saMkaRRA 404 -1
saMkaRRa 404 -1
sankhARRA 404 -1
saMkarA 404 -1
saMkaRa 404 -1
saMkaRA 404 -1
saMkARRA 404 -1
saMkhara 404 -1
saMkhAra 404 -1
saMkhArA 404 -1
saMkhARa 404 -1
saMkhARA 404 -1
saMkhaRRA 404 -1
saMkhaRRa 404 -1
saMkharA 404 -1
saMkhaRa 404 -1
saMkhaRA 404 -1
sankhARRa 404 -1
sankhARA 404 -1
sankAra 404 -1
sankArA 404 -1
sankARa 404 -1
sankARA 404 -1
sankaRRA 404 -1
sankaRRa 404 -1
sankarA 404 -1
sankaRa 404 -1
sankaRA 404 -1
sankARRa 404 -1
sankARRA 404 -1
sankhaRRA 404 -1
sankhAra 404 -1
sankhArA 404 -1
sankhARa 404 -1
sankhaRRa 404 -1
sankhaRA 404 -1
sankhara 404 -1
sankharA 404 -1
sankhaRa 404 -1
saMkhARRa 404 -1
saMkhARRA 404 -1
saJkaRA 404 -1
saJkaRRa 404 -1
saJkaRRA 404 -1
saJkAra 404 -1
saJkaRa 404 -1
saJkarA 404 -1
saNkhARRa 404 -1
saNkhARRA 404 -1
saJkara 404 -1
saJkArA 404 -1
saJkARa 404 -1
saJkhaRa 404 -1
saJkhaRA 404 -1
saJkhaRRa 404 -1
saJkhaRRA 404 -1
saJkharA 404 -1
saJkhara 404 -1
saJkARA 404 -1
saJkARRa 404 -1
saJkARRA 404 -1
saNkhARA 404 -1
saNkhARa 404 -1
saNkaRRA 404 -1
saNkAra 404 -1
saNkArA 404 -1
saNkARa 404 -1
saNkaRRa 404 -1
saNkaRA 404 -1
saNkara 404 -1
saNkarA 404 -1
saNkaRa 404 -1
saNkARA 404 -1
saNkARRa 404 -1
saNkhaRRa 404 -1
saNkhaRRA 404 -1
saNkhAra 404 -1
saNkhArA 404 -1
saNkhaRA 404 -1
saNkhaRa 404 -1
saNkARRA 404 -1
saNkhara 404 -1
saNkharA 404 -1
sANkhara 404 -1
sANkharA 404 -1
zaNkharA 404 -1
zaNkhaRa 404 -1
zaNkhaRA 404 -1
zaNkhaRRa 404 -1
zaNkhara 404 -1
zaNkARRA 404 -1
zaNkArA 404 -1
zaNkARa 404 -1
zaNkARA 404 -1
zaNkARRa 404 -1
zaNkhaRRA 404 -1
zaNkhAra 404 -1
zaJkara 404 -1
zaJkarA 404 -1
zaJkaRa 404 -1
zaJkaRA 404 -1
zaNkhARRA 404 -1
zaNkhARRa 404 -1
zaNkhArA 404 -1
zaNkhARa 404 -1
zaNkhARA 404 -1
zaNkAra 404 -1
zaNkaRRA 404 -1
zaMkhaRa 404 -1
zaMkhaRA 404 -1
zaMkhaRRa 404 -1
zaMkhaRRA 404 -1
zaMkharA 404 -1
zaMkhara 404 -1
zaMkARA 404 -1
zaMkARRa 404 -1
zaMkARRA 404 -1
zaMkhAra 404 -1
zaMkhArA 404 -1
zaNkarA 404 -1
zaNkaRa 404 -1
zaNkaRA 404 -1
zaNkaRRa 404 -1
zaNkara 404 -1
zaMkhARRA 404 -1
zaMkhARa 404 -1
zaMkhARA 404 -1
zaMkhARRa 404 -1
zaJkaRRa 404 -1
zaJkaRRA 404 -1
zaGkARRa 404 -1
zaGkARRA 404 -1
zaGkhara 404 -1
zaGkharA 404 -1
zaGkARA 404 -1
zaGkARa 404 -1
zaGkaRRa 404 -1
zaGkaRRA 404 -1
zaGkAra 404 -1
zaGkArA 404 -1
zaGkhaRa 404 -1
zaGkhaRA 404 -1
zaGkhARA 404 -1
zaGkhARRa 404 -1
sankara 404 -1
zAnkara 404 -1
zaGkhARa 404 -1
zaGkhArA 404 -1
zaGkhaRRa 404 -1
zaGkhaRRA 404 -1
zaGkhAra 404 -1
zaGkaRA 404 -1
zaGkaRa 404 -1
zaJkARRA 404 -1
zaJkhara 404 -1
zaJkharA 404 -1
zaJkhaRa 404 -1
zaJkARRa 404 -1
zaJkARA 404 -1
zaJkAra 404 -1
zaJkArA 404 -1
zaJkARa 404 -1
zaJkhaRA 404 -1
zaJkhaRRa 404 -1
zaJkhARRa 404 -1
zaJkhARRA 404 -1
zaGkara 200 -1
zaGkarA 404 -1
zaJkhARA 404 -1
zaJkhARa 404 -1
zaJkhaRRA 404 -1
zaJkhAra 404 -1
zaJkhArA 404 -1
zaMkARa 404 -1
zaMkArA 404 -1
sAJkhAra 404 -1
sAJkhArA 404 -1
sAJkhARa 404 -1
sAJkhARA 404 -1
sAJkhaRRA 404 -1
sAJkhaRRa 404 -1
sAJkhara 404 -1
sAJkharA 404 -1
sAJkhaRa 404 -1
sAJkhaRA 404 -1
sAJkhARRa 404 -1
sAJkhARRA 404 -1
sAGkaRRA 404 -1
sAGkAra 404 -1
sAGkArA 404 -1
sAGkARa 404 -1
sAGkaRRa 404 -1
sAGkaRA 404 -1
sAGkara 404 -1
sAGkarA 404 -1
sAGkaRa 404 -1
sAJkARRA 404 -1
sAJkARRa 404 -1
sANkhArA 404 -1
sANkhARa 404 -1
sANkhARA 404 -1
sANkhARRa 404 -1
sANkhAra 404 -1
sANkhaRRA 404 -1
sANkhaRa 404 -1
sANkhaRA 404 -1
sANkhaRRa 404 -1
sANkhARRA 404 -1
sAJkara 404 -1
sAJkAra 404 -1
sAJkArA 404 -1
sAJkARa 404 -1
sAJkARA 404 -1
sAJkaRRA 404 -1
sAJkaRRa 404 -1
sAJkarA 404 -1
sAJkaRa 404 -1
sAJkaRA 404 -1
sAGkARA 404 -1
sAGkARRa 404 -1
zankhaRa 404 -1
zankhaRA 404 -1
zankhaRRa 404 -1
zankhaRRA 404 -1
zankharA 404 -1
zankhara 404 -1
zankARA 404 -1
zankARRa 404 -1
zankARRA 404 -1
zankhAra 404 -1
zankhArA 404 -1
zaMkaRA 404 -1
zaMkaRRa 404 -1
zaMkaRRA 404 -1
zaMkAra 404 -1
zaMkaRa 404 -1
zankhARRA 404 -1
zankhARa 404 -1
zankhARA 404 -1
zankhARRa 404 -1
zankARa 404 -1
zankArA 404 -1
sAGkhaRRa 404 -1
sAGkhaRRA 404 -1
sAGkhAra 404 -1
sAGkhArA 404 -1
sAGkhaRA 404 -1
sAGkhaRa 404 -1
sAGkARRA 404 -1
sAGkhara 404 -1
sAGkharA 404 -1
sAGkhARa 404 -1
sAGkhARA 404 -1
zankaRA 404 -1
zankaRRa 404 -1
zankaRRA 404 -1
zankAra 404 -1
zankaRa 404 -1
zankarA 404 -1
sAGkhARRa 404 -1
sAGkhARRA 404 -1
zankara 404 -1
zAnkarA 404 -1
Regarding BakzaMkAra
Error message is:
responded with a status of 414 (Request-URI Too Large)
This has to do with GET and POST methods of HTTP communication to server.
Currently, system is using "GET". I've read that GET requests have a limit, but never ran into that limit before.
Changed to POST, so now it works. There are 3840 variants generated for BakzaMkAra
In this work on v0.1, I made a change that zapped the retrieval of images.
To undo that error, v0.1 currently doesn't function properly. Working on solution.
I think this problem fixed now. v0.1 seems to be working as before.
3840 variants
Oh my....
Server checked 3840 alternates. 200 code means found in mw. 3rd field is word frequencey score (or -1 or -9)
I'm working now on moving the variant-generation algorithm from js to php.
That will make it easier to improve the algorithm. Clearly there are some 'impossible' spellings being generated, like aRRA (hk) = aFA (slp1) --- vowel+vowel+vowel;
and these impossibles should be discarded by the variant-generation.
Clearly there are some 'impossible' spellings being
That would require a sandhi tool testing, I guess.
@SergeA checked matri
and 48 alternates are wrong. The right mAtR
was found only by http://spokensanskrit.de/index.php?beginning=0+&tinput=+matri&trans=Translate, so I guess we need to add more equations.
Here's version 0.2.
matri, vishnu, krishna all found.
@SergeA @gasyoun Find some more that v0.2 misses!
Not actually missing one, but an enhancement. I entered punya with an expectation to see puRya, which I very well got. But it took me close to three four seconds. I wonder why phu should be thought instead of pu. Let us keep this business restricted to 'P' as in SLP1, and s/z/S. For rest of consonants, people dont write p for ph. This will eliminate so many sure shot alternates.
people dont write p for ph.
Hmm, so no one could ever write kapa
or kafa
for kapha
?
Hmm, so no one could ever write kapa or kafa for kapha?
kafa is possible. kapa never. At least never from Indian subcontinent. I am not sure about Europe or America.
kafa is possible
So it's something to add.
I am not sure about Europe or America.
That needs testing. We need to gather data on what people will enter. After we can decide to kill or not. I'm willing to kill it as well, but let's see what people actually enter and not what they should. This whole thing is about what people do and not what they are thought to.
Issue is the system is sufficiently slow. No need to slow it for virtually impossible items.
Issue is the system is sufficiently slow.
It is slow. But not because of a few combinations. The whole approach needs to be changed. @juhnowski any ideas why it's so slow compared to http://spokensanskrit.de/index.php?beginning=0+&tinput=+kafa&trans=Translate (and yes we will have this feature, but they have not):
kava कव
kSava क्षव
kUpa कूप
kab कब्
kahva कह्व
kapha कफ
kapi कपि
kaupa कौप
kav कव्
kavi कवि
What I like at ss.de
kapi
kappa
kavi
kva
Kap
kApI
kApya
kAvya
kSepa
kav/i
Is the
Now matri
finds mAtR (mAtf) 200 69
as expected. But will not find mAtar
nor mAtari
. So what should we do? And how to find words that contain what we search? So not only exact match, but string mode as well.
adrimAtar:PW,PWG
anAmAtarjanIdvaya:PD
anAmAtarjanIyuga:PD
anAmAtarjanyagra:PD
anumAtar:SCH
aparamAtar:BHS
amAtar:PW
ayanamAtar:PWG
arTamAtar:SCH
alamAtardana:MW,PW
avantimAtar:PW
aSezamAtar:SCH
aSvamAtar:PW
asramAtar:PW
AkASamAtar:BHS
indramAtar:PW,PWG
ihehamAtar:PW,PWG,SCH
upamAtar:PW,PWG,SCH
fdDilamAtar:BHS
kandarpamAtar:PW
kARelimAtar:PW
kARelImAtar:PWG
kIwamAtar:PW
kuntImAtar:PW
ganDamAtar:PW,PWG
gomAtar:PW,PWG
gomAtara:BUR
citpramAtar:SCH
jaganmAtar:CCS,PW,PWG
jantumAtar:PW
jAmAtar:CCS,PW,PWG
trimAtar:PW,PWG
duhitAmAtar:CCS,PW
devamAtar:CCS,PW,PWG
devamAtara:PUI
dEtyamAtar:PW,PWG
dvimAtar:PW,PWG
DAnyamAtar:PW,PWG
DmAtar:CCS,PW,PWG
nAgamAtar:CCS,PW,PWG
nirmAtar:CCS,PW,PWG
parapramAtar:SCH
pfSnimAtar:PW,PWG
pramAtar:CCS,PW,PWG,SCH
prARimAtar:PW,PWG
BadramAtar:PW,PWG
BAgamAtar:PW,PWG
BizaNmAtar:PW,PWG
BuvanamAtar:PW
BUtamAtar:PW,PWG
maRqUkamAtar:PW,PWG
martyendramAtar:PW
mahAkASamAtar:BHS
mahAmAtar:PW
mahimAtaraMga:MW,PW
mAtar:CCS,PW,PWG,SCH
mAtara:PUI
mAtaraH:IEG
mAtarapitarO:MW,PW,PWG,SKD
mAtarapitf:SHS,VCP,WIL,YAT
mAtarApitf:GRA
mAtari:MW
mAtaripuruza:MW,MW72,PW,PWG,SCH
mAtaripuruzaH:AP,AP90
mAtariBvan:GRA
mAtariBvarI:MW,PW
mAtariSva:MW,MW72,PUI,PW,PWG
mAtariSvaka:MW,PW,PWG
mAtariSvan:AP,AP90,BEN,BUR,CAE,CCS,GRA,INM,MCI,MD,MW,PE,PW,PWG,SHS,STC,VCP,VEI,WIL,YAT
mAtariSvarI:MW,PW
mAtariSvA:SKD
mAtariSvAna:PUI
mAtfmAtar:PW,PWG
mAyApramAtar:SCH
muktAmAtar:PW,PWG
mfgAramAtar:BHS,SCH
yAmAtar:PW,PWG
yogamAtar:PW,PWG
yogimAtar:PW,PWG
raNgamAtar:PW,PWG
rasamAtar:PW
rAjamAtar:CCS,PW,PWG
rAhulamAtar:BHS
lokamAtar:PW,PWG,SCH
lokAnAMmAtaraH:INM
lohityAyanamAtar:PW,PWG
varRamAtar:PW,PWG
vijAmAtar:PW,PWG
vinirmAtar:PW,PWG
vimAtar:PW,PWG
viSvamAtar:PW,PWG
vIramAtar:PW,PWG
vedamAtar:PW,PWG
vEdyamAtar:PW,PWG
vyAsamAtar:PW
SakramAtar:PW,PWG
SatasahasramAtar:BHS
SUnyapramAtar:SCH
sanmAtar:PW
saptamAtar:PW,PWG
samAtar:PW,PWG
saMmAtar:CCS,PW,PWG
sarvamAtar:PW,PWG
sinDumAtar:CCS,PW,PWG
sumAtar:PW,PWG
sOmyajAmAtar:PW,PWG
skandamAtar:PW,PWG
svarRamAtar:PW,PWG
svedamAtar:PW
hatamAtar:PW,PWG
matri, vishnu, krishna all found.
I'd suggest also to add the spelling "krushna". Some people say it this way.
spelling "krushna"
Indeed, none found.
Server returns 1152 alternates.
200 code means found in mw.
3rd field is word frequency score (or -1 or -9)
khRusna (Kfusna) 404 -9
khRuSmA (KfuzmA) 404 -9
khRuSma (Kfuzma) 404 -9
What about
sanskrit
?
Should it find saMskfta
(+saMskftaM
) and saMskfti
?
Server returns 12000 alternates.
200 code means found in mw.
3rd field is word frequency score (or -1 or -9)
ShaJshkhRRth (zhaYshKFT) 404 -9
ShaGskrit (zhaNskrit) 404 -9
ShaGskriT (zhaNskriw) 404 -9
Not having access to computer. So scribbled on a page and photo taken from mobile. This takes care of majorly ITRANS, HK and SLP1. If something is left out, we can discuss and add.
If something is left out, we can discuss and add.
https://github.com/sanskrit-lexicon/Cologne/issues/8#issuecomment-277377696 was how it begun, https://github.com/sanskrit-lexicon/Cologne/issues/8#issuecomment-94280465 is still pending (Jim, you can!), https://github.com/juhnowski/sanskrit-simple-search/blob/master/fetching.html is how it looked before Jim started.
var transitionTable = [
["a","A"],
["i","I"],
["u","U"],
["r","R","RR"],
["l","lR","lRR"],
["h","H"],
["M","n","N","J","G"],
["z","S","s"],
["b","v"],
["k","kh"],
["g","gh"],
["c","ch"],
["j","jh"],
["T","Th","t","th"],
["D","Dh","d","dh"],
["p","ph"],
["b","bh"],
["sh","z"]
]
Likely causes of slowness
In v0.2, the variants are generated using HK (see table above). Database searches require SLP1. So hundreds/thousands of HK spellings must be transcoded to SLP1. This is relatively slow
Once we have SLP1 spellings, the current technique checks the database (MW) for every spelling variant. Database access (i/o) is relatively slow compared to computation.
We should be able to exclude certain spellings without reference to database.
We could generate a table of known 2-grams, and not bother to check database for a spelling that has a non-existent 2-gram. Such a table would have several hundred-thousand 2-grams, and be
accessed by hashing process (PHP associative array). This would be much faster way to exclude such cases than database lookup.
We need to have a test suite.
This would be a list of input spellings and output results that any technique should generate.
When we vary the algorithm, we should validate that the new algorithm still passes the test suite.
Reason: a change in algorithm aimed to enhance the results could have undesired side effect of failing to solve previously solved spellings. We won't know this unless we have a test suite.
thousands of HK spellings must be transcoded to SLP1
Now I see why it was a bad idea. Goot that it's an easy fix.
table of known 2-grams
Was thinking about the same today. And not only that. Some can be in the beginning, some only in middle and not all at the end of a word. That we should keep in mind, I guess.
new algorithm still passes the test suite
Sounds amazing, too smart for me.
v0.3 is faster due to:
It can be made faster by weeding out unknown initial 2-grams.
krushna, matar, give expected results.
Made an experiment with 'f' (kafa) --- this is odd because 'f' is not an HK letter, but is an SLP1 letter. So its handling must be different.
Will consider Dhaval's scratch sheet next.
Suggestions ?
v0.3 seens reasonably faster.
One more python suggestion.
if member in list is considerably slower than if member in set(list).
If you use the ngram list instead of set, converting to set will improve performance multifold .
Where is the code by the way, @funderburkjim?
In this case, all the code is php. I am using an associative array for the ngram lookups, which should be relatively fast.
Code is on Cologne server.
https://stackoverflow.com/questions/13483219/what-is-faster-in-array-or-isset may be of interest for speed up
I'm actually using isset($ngram['xyz'])
which seems to be what stackoverflow suggests.
in_array('xyz',$ngrams)
looks comparable to python 'xyz
in $ngrams` -- both slow.
What I'm doing now is ngram checking while alternates are generated. e.g., if a potential alternate starts with X and X contains bad ngrams,then any alternate XY will also have bad ngrams - hence no need to consider possible Ys.
v0.3a changes:
faster -- now using 2-grams and 3-grams, including beginning 2 and 3-grams to narrow the search
Several of Dhaval's alternates are included (gooruu, for instance). Some consonantal ones are not included yet --- @drdhaval2785 -- give me some examples. It seems like some of your alternates are a mixture of slp1 and hk; I'm currently confused by these.
@gasyoun I've done something with 'f' --- could you review this, suggest improvements if needed. Also consonant doubling after 'r'. Also, you suggested that 'any vowel can be any vowel' . so for 'rama' you would try 'rimu', etc. etc. (10*10 variants if there are 10 vowels)?
There was some mention of allowing IAST extended ascii in the spelling. Is this viewed as desideratum?
I wake up and what do I see? A dream come true.
It can be made faster by weeding out unknown initial 2-grams.
It's so quick now!
Results are impressing, Jim.
Server returns 12 alternates.
200 code means found in mw.
3rd field is word frequency score (or -1 or -9)
bhaj (Baj) 200 65
bhAj (BAj) 200 54
vaj (vaj) 200 -1
NF (vaJ) 404 -9
NF (vAj) 404 -9
I've done something with 'f' --- could you review this, suggest improvements if needed.
Perfect
kapha (kaPa) 200 39
kapa (kapa) 200 0
kApA (kApA) 200 -1
NF (KApA) 404 -9
NF (KApa) 404 -9
What about sanskrit? Should it find saMskfta (+saMskftaM) and saMskfti?
v0.3a gets close but finds nothing. @drdhaval2785 @funderburkjim should it find it? Should sanskrit
bring us saMskfta
or that's too bad input to get good results?
NF (saMskrt) 404 -9
NF (saMskft) 404 -9
NF (saMskarT) 404 -9
As of doubling, Acaryya
finds exactly what it should:
Server returns 13 alternates.
200 code means found in mw.
3rd field is word frequency score (or -1 or -9)
Acarya (Acarya) 200 0
NF (acaryA) 404 -9
NF (acariya) 404 -9
NF (acariyA) 404 -9
NF (acaruyA) 404 -9
NF (acaruya) 404 -9
NF (acarya) 404 -9
NF (Acfya) 404 -9
NF (AcaryA) 404 -9
NF (Acariya) 404 -9
NF (AcariyA) 404 -9
NF (Acaruya) 404 -9
NF (AcaruyA) 404 -9
Entry kuw
is fine
kuT (kuw) 200 1
kUT (kUw) 200 -1
kuth (kuT) 200 -1
kut (kut) 200 -1
NF (KUw) 404 -9
NF (Kuw) 404 -9
NF (kUt) 404 -9
NF (kuW) 404 -9
I woul like to see kuwa
in the results as well (+1 letter at the end scenario)
Compare with http://spokensanskrit.de/index.php?beginning=0+&tinput=+kuT&trans=Translate
- 1 letter
for word ending in consonant, also try word + vowel. That would get sanskrit.
Good idea.
I'd like 'dukha' to find 'duHKa' (SLP1).
It would be good to do some Edit Distance comparisons. e.g., given word spelling W (slp1) find all words in a given list of words (e.g. the headwords of MW) within edit distance D of W. It is clear how to do such a computation. BUT not clear how to make such a computation efficient enough to be of practical use. This sounds like a problem that should have been solved in computer science.
I'm going to let this simmer a few days before making further adjustments.
Request others to find cases where the algorithm is missing something it should get.
I tried with 'dhaval' and expected to see 'Davala'. Did not come. It is a phenomenon known as 'schwa syncope'. Terminal 'a' is dropped under influence of local languages.
It would be better if we can look up for input+a in case input ends with a consonant. dhaval and kuw issue will be resolved.
word + vowel
I would say word + a
I would say word + a
As a possible starting point.
@gasyoun
I'm thinking that the next thing to do is to have the spelling interface with the hwnorm1 spellings.
This would permit one to get the right word in AP90 for ashva, for instance (where the actual headword
spelling is aSvaH - with the visarga at the end).
What do you see as a good next step?
What do you see as a good next step?
I would be happy to see it even as it is. What you propose is a good addition (and you know they are endless), not critical at beta testing.
Here's a next step : v0.3b fetching.
This version accesses a (newly created) database form of hwnorm1c.
Here's output for 'ashvah':
1: azva (aSva) 200 70
aSva:BEN,BHS,BOP,BUR,CAE,CCS,GRA,IEG,INM,MD,MW,MW72,PE,PUI,PW,PWG,SCH,SHS,STC,VCP,VEI,WIL,YAT
aSvaH:AP,AP90,SKD
2: asva (asva) 200 4
asva:AP,AP90,MD,MW,MW72,PW,SCH,SHS,STC,VCP,WIL,YAT
asvaH:SKD
3: Azva (ASva) 200 0
ASva:AP,AP90,BUR,CAE,CCS,MD,MW,MW72,PW,PWG,SHS,VCP,WIL,YAT
ASvaM:SKD
4: Asva (Asva) 200 -1
Asva:CCS,MD,PW
Others are encouraged to experiment.
Currently, we're using the word frequency list (this is from Marcis -- not sure where documented -- it was from DCS ?). Another possibility would be to prioritize on basis of the number of Cologne dictionaries containing the normalized spelling. Check out 'siva' as example.
There are some odd alternates -- look at 'hari', 'shankara', 'karmman', 'sangama'.
Not sure whether these odd alternates require tuning.
Also not sure how to integrate this into a 'real' display. Welcome suggestions on this point.
It seems likely that Allowing either Devanagari or IAST in input would be a modest enhancement -- The program could first do a trancoding from Devanagari or IAST into HK. Then proceed as if the user had typed HK. Might be able to do similar with ITRANS. If this works, then we would have a solution to 'auto-detection' of input .
Not sure whether this is important to do now.
it was from DCS
Exactly.
Another possibility would be to prioritize on basis of the number of Cologne dictionaries containing the normalized spelling. Check out 'siva' as example.
As an option, indeed.
Might be able to do similar with ITRANS.
It's dead. Let it swim down the river.
a solution to 'auto-detection' of input
Hurray!
Not sure whether this is important to do now.
It is! It's much more important than converting one more out of 33 dictionaries to IAST. It's a universal UI. We have it as a playground for a few months, yet nothing in real life.
Also not sure how to integrate this into a 'real' display. Welcome suggestions on this point.
What about a list of IDs (invisible table)? Anything will do to see it in action, not just a blank page.
list-0.2s.html is a generalization of list-0.2.html. One of the input options is 'simple'. Give it a try !
Give it a try !
Finally, thanks Jim! I search for aja
in HK mode. It returns (in a smart way)
अजा
अ-ज
अ-जा
And that's good. What if there are many words, maybe add an index above with anchor links on the same page, so I do not have to scroll to know what are the possible variants? When I chose simple
mode nothing changed, it seems all modes have become smarter.
As the index is above the usual interface, did not notice it. Maybe add a header above? Possible solutions,
कृ
खरु
क्रु
कॄ
करि
Jim, you have finished what I have asked for, everything, when I search for varanasi I get वराणसी. I give you my thanks. But why when I searched go
I got directly to go
articles without any index and no possibility to know that there is even gai
? But the go
case is not of mega importance, asking to understand if I have missed anything.
In a case like:
मण्डूक
मधुक
मधूक
माधुक
मधुका
माण्डूक
माधूक
नान्दुक
मादुक
मण्डुक
everything does not fit my page anymore. Thinking loud, maybe
मण्डूक ; मधुक ; मधूक ; माधुक ; मधुका ; माण्डूक ; माधूक ; नान्दुक ; मादुक ; मण्डुक
is a solution, @drdhaval2785 ?
When I entered manduk
I got WORD NOT FOUND in mw dictionary
and must say http://spokensanskrit.de/index.php?beginning=0+&tinput=manduk+&trans=Translate failed as well:
mandAka मन्दाक
mandaka मन्दक
maNDukI मण्डुकी
madgu मद्गु
madhukA मधुका
madhuka मधुक
madhus मधुस्
madikA मदिका
mandAkSa मन्दाक्ष
mandAsu मन्दासु
So we do not add endings
? I mean (based on real life examples), a translator of an ayurvedic book (who has never learned Sanskrit and still needs to translate a book, written by an Indian in English) finds jal
instead of jala
in a printed book and will never find the real word, even in our dictionaries.
Question above mentioned 'gai'. 'gai' is not one of the alternates generated by 'go'. The only ones are (in SLP1) go,Go. But Go is not found in the MW dictioanary (which you specified in above display), while 'go' is found. Thus, there is ONLY ONE SOLUTION.
In the situation where there is only one solution, the display does not show a list of possibilities with just one member.
Note a given citation will present different behavior depending on the dictionary. For instance, try 'guru' with
When there is more than one possibility, the variants are clickable. The first variant is displayed initially. Clicking on another variant shows the display for that variant.
In the situation where there is only one solution, the display does not show a list of possibilities with just one member.
And it's good.
Note a given citation will present different behavior depending on the dictionary. For instance, try 'guru' with
Indeed.
An additional rule was added to the alternate generation to deal with the 'manduk' and 'jal' examples mentioned above. Both of these are cases where the final 'a' of Sanskrit spelling has been omitted. As Dhaval has mentioned elsewhere, this schwa deletion is common in modern Indian languages.
So, when the user input ends in a consonant, and the variants are generated, for each variant we'll add an extra variant with an extra 'a'.
This will let the manduk and jal examples generate desired answer.
it will also have some words generate additional possibilities (think 'gam', which will now show 'gama' also).
simple
This takes care of the problem of a long list of alternates between the citation and the display. The number of alternates is shown. This menu appears even if there is only one option (slightly different from prior version). Selecting one of the alternates from the menu changes the display to that option.
simple
You can copy/paste (or type) either Devanagari unicode or IAST into the citation. A conversion will be done automatically.
Note the output can be changed to IAST or Devanagari or any of the other output
options. (previously it was hard-coded to Devanagari).
it will also have some words generate additional possibilities (think 'gam', which will now show 'gama' also).
If there is gam in DB, maybe no need for gama? In jal case there was no result before, as compared with gam, where we had.
pook
And I got phuka
, impressed.
Selecting one of the alternates from the menu changes the display to that option.
It's good and bad at the same time. It's harder to notice (but anyway I'll write a FAQ on how to use it, it's no more obvious) and you have to click, to see what's there and can't copy-paste the list in list form.
If used at all, maybe make the dictionary list dropdown as well? I remember the abbreviations (not always), you do and Dhaval, but what about the rest? I guess it's abracadabra for them.
I'm thinking loud of a fine tuning. Suppose I entered danda
and I would actually want the daṇḍa
to bee found and not dada
, that is more common and because of that comes first. What if not all conversions are equal. What if we give a priority to those variants where the number of letters maches first?
And the English-Sanskrit Dictionaries do not work in this issue? Searched for love
, found none.
I'm thinking about what we still miss to become more popular than our clone:
daRqa v. dada
The display currently shows 'dada' as preferential to daRqa in simple search for 'danda'.
The preferences come from the word_frequency list.
Thus far, two layers have been uncovered in this problem:
There is a problem in the word_frequency file itself for the particular word 'daRqa' (the spellings are SLP1 in this file). Namely, The word appears twice (at lines 29809, and 29810), with frequency numbers '77' and '0'. The program is using the last (second) value, namely '0'. There are other duplicates in this file. A check shows that the lines are in the same order as the 'word_frequency.js' file that I got originally from Ilya.
@ Marcis -- do you know how to interpret these duplicates? Were you aware of these duplicates?
Were you aware of these duplicates?
No.
do you know how to interpret these duplicates?
No.
What I wanted to say, even if the frequency is higher, we should 1st show a word that matches the number of letters. Agree?
Of the 72933 records in word_frequency , there are 4932 words which appear more than once.
These words, along with the various frequencies, are shown in word_frequency_dups .
One way to resolve duplicates is to take the MAX frequency. This results in word_frequency_adj , which now has 67050 distinct words.
The display uses these adjusted word frequencies to order the results. This gives some definite improvement, e.g. daRqa is now first result for 'danda', 'Siva' is now first result for 'siva' (formerly sivA was first), etc.
we should 1st show a word that matches the number of letters, Agree?
Not yet. Let's first find some examples where the now corrected word_frequency ordering looks wrong.
We may need to alter the word frequency file further to take into account normalized spellings (of hwnorm1) -- not sure of this..
maybe make the dictionary list dropdown as well?
One reason for using this autocomplete form has to do with the non-public dictionaries. These are
not in the list of suggestions, but they may be typed in by users who know the code. This would not
be possible using a normal <select><option>
drop-down menu.
However, I've spent some time learning some of the finer points of using the autocomplete widget of jquery UI, and this now behaves more like a drop down menu. In particular, when you focus on the element (by clicking in it), the full list displays -- then you can select another dictionary or not. The suggestion aspect also works, so if you type a letter or two, the list is narrowed down.
@gasyoun Hope you like the change.
This continues the research begun in #8.
The url of the version 0.1 is this.
This takes into account word frequency, and uses the MW dictionary instead of WIlson.
I think this pretty well represents the work that Ilya and Marcis began.
@gasyoun Agree?
If so, we can start tinkering with the algorithm that generates the alternates.