sanskrit-lexicon / csl-apidev

0 stars 0 forks source link

simple search, v1.1a #27

Open funderburkjim opened 3 years ago

funderburkjim commented 3 years ago

v1.1a of simple-search is quite different under the hood that v1.1 and previous versions.

It may currently be used with 'simple1.1a' instead of 'simple'. e.g. https://sanskrit-lexicon.uni-koeln.de/simple1.1a/mw/munih

The performance should be more uniform across words.

The program depends heavily on sqlite fts, which is a full text search (a.k.a. inverted index) capability of sqlite. Making and querying inverted indices underpin most search applications, but usually (as with those based on java Lucene) require a separate server. However, sqlite's inverted indexes can be installed and queried much like any other sqlite database.

This functionality (using fts4 version) is available via the version 3.6 of Python at Cologne. It is not natively available in the sqlite version of PHP at Cologne. However, a PHP program can query an fts table via python by using 'shell_exec'. That's what is being used in simple1.1a.

There's a lot more to say, technically, about this approach. But first, use it some and let me know if any important features have been missed. You can still get at the prior 1.1 version by 'simple'.

Also, as a teaser, try searching for some declined forms (e.g. muniBiH, devena, sisunAm) -- this is one area where v1.1a might be able to be extended much more readily than prior versions.

gasyoun commented 3 years ago

The program depends heavily on sqlite fts,

Wonder if it could be used for offline desktop version of the dictionaries as well.

devena

devena works, wow, @funderburkjim

some declined forms (e.g. muniBiH, devena, sisunAm)

Is there a list out there what can, and what not? Pronominals I did not found.

New case: sradha 3 results: śarada śaradā saraḍa sraddha 3 results: śraddhā śrāddha śraddha

Should we suppose, that if I search for sradha that I might have meant śraddhā?

gasyoun commented 3 years ago

New case: I was wrongly looking for viśravasa but needed viśravas actually.

gasyoun commented 3 years ago

@funderburkjim agree with sradha?

funderburkjim commented 3 years ago

Haven't thought about sradha example. Thanks for reminder.

Can you think of a generalization of this? Is it only 'dD' (slp1)? or is this one instance of a more comprehensive pattern?

gasyoun commented 3 years ago

Can you think of a generalization of this?

like tT?

Is it only 'dD' (slp1)?

I believe t as well.

gasyoun commented 3 years ago

I entered aṃśumāna and found out it should have been actually aṃśumat. So got 0 no results found. I can propose it was meant aṃśumān instead of aṃśumāna, still.

https://archive.org/details/in.ernet.dli.2015.308381/page/n147/mode/2up

gasyoun commented 3 years ago

@funderburkjim people still mix SLP1 and simple. We can't change SLP1 name, not so sure about simple.

Almost full match names should come higher in the list than even popular, but variations, agree with this case?

bhisma

funderburkjim commented 3 years ago

I agree this is confusing.

Suggestion: Get rid of the menu called 'input'.

Worth a try?

gasyoun commented 3 years ago

Suggestion: Get rid of the menu called 'input'.

If simple is the default, let's go for it.

gasyoun commented 2 years ago

@funderburkjim sankhya gives as expected:

5 results: saṃkhyā sāṃkhya saṃkhya śaṅkya śāṅkhya

saṁkhya gets ṁ lost and nothing from what is offered is of any interest:

5 results: sakhya śakya śākya śākhya sākhya
funderburkjim commented 2 years ago

@gasyoun The 'm-dot-above' is now handled in 'simple' image

funderburkjim commented 2 years ago

@gasyoun However, the '1.1a' version does not catch this. This is an unexpected difference between /simple1.1a/ and /simple/.
image

gasyoun commented 2 years ago

@funderburkjim if we want to use the same SIMPLE page for English to Sanskrit translations, it becomes troublesome.

god

If you type anything from a phone, the first letter in the input box will become Capital by default. Nothing will be found.

Godss

But even if we type an English word without capital letters, the result will be found, but will not be counted as such, remaining 0.

funderburkjim commented 2 years ago

I hadn't really thought about 'simple' for MWE, AE, etc. Probably the current logic is inappropriate for non-Sanskrit headwords.

The code base of 'simple' has become complicated enough to be difficult to manage.

And the UI needs to be rethought as the interactions among the user choices has become difficult to predict.

I currently almost always use 'input=slp1' setting -- that way the 'Suggestion' list is available but the spelling change features of 'input=simple' are not present at all. Perhaps this setting should be separated out as a 'suggest' app, and removed from 'simple' , since this usage is not really in the spirit of 'simple'.

gasyoun commented 2 years ago

Probably the current logic is inappropriate for non-Sanskrit headwords.

Yap, not working. Input devanagari in this /simple does not work as well. Worked well before. Stopped working lately.

fsdfsdfsdfsdfdsfsd

The code base of 'simple' has become complicated enough to be difficult to manage.

Should not be that much code to get lost. Is it?

And the UI needs to be rethought as the interactions among the user choices has become difficult to predict.

Like to test on different scenarios? I've proposed one a student has asked me for lately.

I currently almost always use 'input=slp1' setting -- that way the 'Suggestion' list is available but the spelling change features of 'input=simple' are not present at all.

So you're like a robot. There are 5 people on the Earth who think in SLP1.

since this usage is not really in the spirit of 'simple'.

I agree. But do not see an issue with leaving it as well.

gasyoun commented 1 year ago

lakṣmīvān will not show me lakṣmīvat - should we try to show Nominative forms? @funderburkjim

gasyoun commented 1 year ago

halahala will never show hālāhala, but should @funderburkjim

asddasasas

and

fsdsdfdsfsd

gasyoun commented 1 year ago

Searching for āyuṣman will not show us āyuṣmant, which is āyuṣmat in MW; neither is āyuṣmān generated, but mentioned inside the article @funderburkjim

ayushman