Open funderburkjim opened 7 years ago
The user is thinking of the /tamil/recherche display. That display is based solely on the HK transliteration.
If we contemplate implementing it in the newer displays (for MW or other dictionaries), even if we restrict to the case where the user has chosen HK for the input method, then the problem is harder. The reason is that the underlying spelling of sanskrit words in the newer dictionaries is in the SLP1 transliteration.
Consider the example if 'dhiSNya' (in HK). Now when this is lower-cased in HK, we get 'dhisnya'.
Now the user hopes to retrieve 'dhiSNya' when he enters 'dhisnya'.
But we are searching SLP1 spellings, so we are wanting to find the SLP1 spelling 'DizRya' .
How do we know that from 'dhisnya' ? Maybe we say that the 's' when converted to SLP1 can be either an 's' (dental sibilant) or a 'z' (cerebral sibilant). And maybe that 'n' can be either a normal dental 'n' or the cerebral nasal 'R' (in slp1).
So we would be looking for 4 SLP1 possibilities:
So, I guess if we searched for all four of these SLP1 spellings in our SLP1-based dictionary, that would be the same as doing a case-insensitive search in an HK-based dictionary for ''dhisnya" .
Of course, we'd have to do similar kinds of conversions for all the other upper/lower case HK spellings.
This looks like it is theoretically possible, but would be quite awkward.
Here are two other comments related to this suggestion:
If the dictionary displays were based on a search engine, this would be conceptually simpler. It would be a matter of adding a field (Say, 'gram') where we could store the grammatical categories of the entries. Note that for MW, these are already available (at least the noun, verb, indeclineable categories are known).
Another interesting category would be 'ls'; so we could search for records with a reference , say, to the Hitopadesha; and this could be done in conjunction with conditions on the spelling of the word. Again, this 'ls' information is identifiable for MW (and at least almost-identifiable for PW, PWG).
I regret however to say that I am not happy with the fact that it is capital-letter sensitive: dhisnya does not appear in the search but dhiSNya with the two retroflexes does. The same applies to long vowels. I feel very uncomfortable about this.
Yeah, a pseudo-HK would make sense, the 'simpleHK', as indeed - sometimes you do not know exactlt how the word is spelled. None of the sites has it at the level actually needed.
http://www.perseus.tufts.edu/hopper/search
This book is still in copyright. However, it appears to be out of print, as only available through an antiquarian book seller in Germany.
I have received written approval that from Mayrhofer, that I can use KEWA and EWA online. So that should not be an issue.
Another interesting category would be 'ls'; so we could search for records with a reference , say, to the Hitopadesha; and this could be done in conjunction with conditions on the spelling of the word.
Just like http://kjc-sv013.kjc.uni-heidelberg.de/dcs/index.php?contents=texte has.
Mayrhofer, that I can use KEWA and EWA online
Could you send me a link so I can see what these are?
@gasyoun Thanks for the dcs link. Looks like there is a lot of good work there.
In the sentence analysis, I noticed that there is no 'analyzed sandhi' section. Do you know if that it available but just not printed, or is it currently unavailable. Do you know how the analysis was done?
Here are some further comments by this user (in response to email correspondence). Incidentally, I've asked him if he wants to join this Github project:
In fact, I think the criterion for case-insensitive search which already exists in the old version is the convenient one: http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html
I feel that as far as the search is concerned, Sanskrit ṣ can be regarded as the same category as s, i.e. s=ṣ (despite clearly having a different pronunciation), since they very frequently have the same etymology. As you know, the ṣ present in Ai. dhiṣṇya- is the s we find in Lat. fēriae (older fēsiae), festa and has been lost in fānum with compensatory vowel lengthening. The case is similar as the one we find (with a different sound allomorph) in English sun /s/, German Sonne /z/ and Dutch zon /z/. All three words have the same etymology. Regarding zon, Dutch speakers decided to use a different character to reflect the same sound as the one which is written in German the same way as the /s/ of Maus. Following this reasoning I see convenient to consider Sanskrit s and ṣ as the same category in the search machine. However, aṣṭā́(u) - ‘8’ is a case where ṣ does not come from s, but from ḱ.
Sanskrit ṭ ḍ ṇ ṃ sometimes come from pure t d n m sounds (e.g. pṛṣṭhá-m - ‘back’), sometimes come from consonant clusers _r̥C, * l̥C > C (e.g. paṭa <_pl̥ta-). In any case, treating cerebrals with non cerebrals together makes the search easier.
I would say that Sanskrit z/ś is a sound completely different from s, since the former reproduces proto-indoeuropean *ḱ, as is the case in śatám. I think it is better to treat z and s separately at the search machine, but anyway this is a matter of personal choices.
Regarding the distinction between adjectives, nouns and verbs, the issue is by no means easy. Some adjectives can also become nouns and vice-versa. In some cases we have a contradiction, as in hvAnIya, where we find an infinitive (and thus a noun) classified as an adjective. Perhaps it is a problem of the original text rather than from the computer edition.
With a view to a middle-term expansion of the database with etymological entries from Mayrhofer, I just know a bit of Java. Perhaps this can help but I would need to have an electronic version of the Mayrhofer dictionary, which for the time being does not exist.
Following this reasoning I see convenient to consider Sanskrit s and ṣ as the same category in the search machine.
Sure.
In any case, treating cerebrals with non cerebrals together makes the search easier.
Yes!
I would say that Sanskrit z/ś is a sound completely different from s, since the former reproduces proto-indoeuropean *ḱ, as is the case in śatám. I think it is better to treat z and s separately at the search machine, but anyway this is a matter of personal choices.
For search engine I would have ś ṣ s all equal (as option). It's not about etymology, what you try to do is smarter than needed.
Some adjectives can also become nouns and vice-versa
So I would have 1 category for all adjectives and nouns in 1 bucket. The questions is about verbs and non-verbs.
Perhaps this can help but I would need to have an electronic version of the Mayrhofer dictionary, which for the time being does not exist.
It exists, as part of https://www.universiteitleiden.nl/en/research/research-projects/humanities/indo-european-etymological-dictionary - Lubotsky told me in 2006.
Let's move the greek letters a bit, so we can easily see who is above whom.
http://www.sanskrit-lexicon.uni-koeln.de/scans/PWScan/2014/web/webtc/indexcaller.php graha
Good suggestion.
I also think we should change the pwg.txt digitization section markup:
²b) {%das Ergriffene%} u. s. w.: ¹a) {%Beute%} ¯{¤MBH. 3, 11461.¤} {#Syeno grahAluYcane#} ¯{¤MR2K4K4H. 50, 15.¤} -- ¹b) {%haustus, das was mit dem
¹a) -> ¹α), ¹b) -> ¹β) etc.
I'm not sure about whether to leave the superscripts 1 and 2, and there's also a superscript 3 for
(Number subhead = ³1) ³2) etc.. An alternative might be to introduce some xml-type tags - this might
be less obscure than the superscripts.
I would also like to change the digitization so that lines are not so long. The pwg digitization does
not represent the printed text line breaks, as has been mentioned.
But Some more rational system of lines within the pwg.txt file would make it easier to work with (some so-called lines in pwg.txt may be many thousands of characters in length).
The pwg digitization does not represent the printed text line breaks, as has been mentioned.
Oh, missed that.
But Some more rational system of lines within the pwg.txt file would make it easier to work with
But there is no easy way to deal with it. Or just add mechanical breaks?
The best way would be to manually insert a marker where the text line breaks occur; but this is too time-consuming a task to undertake.
Thus, some programmatically feasible approach would be taken. Some desiderata of the end result might be:
This file is a display of the distribution of line lengths in the current pwg.txt.
27.9% of non-empty lines have length < 100 characters. 8.24% have lengths in range 90-99 characters
Is quite actionable.
have each subsection begin on a new line
Right and if we count how many characters per line, then we can add pseudo-line breaks.
The display for PW has now been adjusted:
@gasyoun Is this what you had in mind?
PWG is similarly altered.
The only thing I see as suboptimal in the above is as with '--Beta) Planet ...'
where the long text doesn't also indent. I've only indented the start of the subsection (using
).
Probably there is some way to indent the whole subsection with css.
I'm not sure it's worthwhile to take the time to understand how to make this further adjustment.
Is this what you had in mind?
Thanks, yes.
Probably there is some way to indent the whole subsection with css.
Sure, let me check if I do not forget.
graha
vgl. (Greek) β. Die Planeten
why is (Greek)
still there, if nothing is missing? Remove it? Hope we can retain the markup, but have it shown in a different way, otherwise I start to see what to correct and there is nothing to be corrected.
The (Greek)
is removed.
but have it shown in a different way,
The xml markup remains (in pwg.xml, as <lang n="Greek">β</lang>
.
After the change to the display, there is no visual distinction in the html for Greek.
Do you need the Greek (and Arabic and Russian and OldHebrew) to be visually distinctive in the html displays?
Do you need the Greek (and Arabic and Russian and OldHebrew) to be visually distinctive in the html displays?
No. Enough that they are marked in the code. Too bad that only they are.
@funderburkjim , let's remove the (OldHebrew)
בני אלים, (OldHebrew)
בני אלהים and similar.
@funderburkjim issue still there, language names need to me removed in display.
the 'who is above who' suggestion.
To accomplish this, we'll need to enhance the PWG markup, similarly to the way the <div>
markup was added to ap.xml, as discussed in #113.
(Greek) ... (OldHebrew)
Modified Basic and other displays to avoid showing the language name.
This is part of disp.php that was modified:
} else if ($el == "lang"){
$n = $attribs['n'];
if ($n == 'Russian') {
// nothing to do
}else if ($n == 'Arabic'){
//$row .= "<span style='background-color:yellow'>"; // Temporary April 8, 2015
$row .= "<span>";
}else {
//$row .= "<span class='lang'>($n) ";
// 04-19-2017. Removed showing language name <<<<<<<<<<<<<<<<
$row .= "<span class='lang'>";
}
To accomplish this, we'll need to enhance the PWG markup, similarly to the way the
markup was added to ap.xmlOh, that's not a small task, got it.
Modified Basic and other displays to avoid showing the language name.
Hurray!
A user recently made several suggestions regarding the MW displays.
These seem like interesting ideas, so that's why I'm mentioning them here.
The first one is easy to implement, so I did it.
However, the second and third would likely be quite tricky to implement, and cannot be addressed now.
1. suffix search in the (very old) display.
Can you please enlarge the capacities of the search tool of the Sanskrit dictionary http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html in order to have the possibility of looking for words ending in X, eg. words ending in -van (jítvan, sútvan, jájvan), etc.
I am always very enthusiastic with your brilliant Sanskrit-lexicon tool, which is a powerful means to look for words in this beautiful language.
http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/2014/web/webtc2/index.php
I regret however to say that I am not happy with the fact that it is capital-letter sensitive: dhisnya does not appear in the search but dhiSNya with the two retroflexes does. The same applies to long vowels. I feel very uncomfortable about this.
Perhaps it is also possible to separate the search between verbs, nouns, adjectives and particles.
Please note that this possibility exists in the Perseus Greek dictionary: http://www.perseus.tufts.edu/hopper/resolveform?redirect=true&display=Greek
4. Other comment
This book is still in copyright. However, it appears to be out of print, as only available through an antiquarian book seller in Germany.