suttacentral / legacy-suttacentral

Source code and related files (CSS, images, etc.) for SuttaCentral
http://suttacentral.net/
Other
14 stars 4 forks source link

Improve dictionary results pages #180

Closed sujato closed 8 years ago

sujato commented 8 years ago

There are a few things we can possibly do to enhance the dictionary pages without too much work.

  1. Linkify terms. Significant terms in definitions are marked with <b>. Can we transform these into links to the appropriate terms? Even if we get some of them it would be useful. For some reason this happens on random words already, not sure what's up with this: https://suttacentral.net/define/pakkhaka
  2. Linkify references. We've discussed this before, but the vol/page references should be linked to the appropriate SC IDs. While we're at it, is there are regression here? The refs were nicely marked up, and this seems to have vanished. Actually, I think this is just the DPPN entries, which are still marked up. So obviously we can only linkify these, but at least that's something.
  3. Improve page layout. When the window is narrowed, there is no padding on the right. Also the padding at the bottom is insufficient. This should behave the same way as the text pages (with allowances for the second column, of course).
  4. There's a weird thing where you click on a link and it says the page doesn't exist, eg https://suttacentral.net/define/parihaarika#pts_ped I think it's converting wrong between ā and aa or something like that. Another cases is https://suttacentral.net/define/ekaxmsa#pts_ped for ekaṁsa
  5. Long ago I added <span class="smallcaps"> to the roman numerals. But this is a mistake; in fact it makes them less readable in this context. Best just strip all such tags from the dicts.
  6. Certain improvements to typography might be made with judicious regexes. But this is a deep, dark path …
  7. In some cases our text regresses from that at uchicago. See the entry for amsa, for example. It lists references as Vin iii.127; DhA iii.214, where we have Vin iii.127 DhA iii.214. I.e., we omit the semicolon.
  8. It doesn't always hanled ṃ/ṁ correctly. Eg https://suttacentral.net/define/a%E1%B9%81sa and https://suttacentral.net/define/a%E1%B9%83sa are seperate entries.
  9. There exists a pdf version that gets the Greek Unicode right. Not sure if anything can be done here.

Pali-English_Dictionary,1921-25,v1.pdf

ghost commented 8 years ago

3. Done 4. Done 5. Done 6. …to dungeons deep and caverns old… Only for those brave of heart who are prepared to brave the wrath of the dragon if they get it wrong. (For the sake of peace, the dragon will remain unnamed). 7. See 6. 8. This is because they come from 2 different dictionaries. I think the easiest way to remedy this is to do a regex and make all the ṃ/ṁ consistent i.e. only use ṃ.

1, 2 and 9 I will have a look at later. But with regards to the links it would be good to also make the references consistent throughout the various dictionaries. For instance , in the Dhammika, references are marked as (M.I,335), while in the PTS dictionary as M i.335

sujato commented 8 years ago

Thanks. Some comments;

3: Is better, but only works down to a certain screen width. See screenshot. It should be tested on a mobile.

screenshot from 2016-08-29 07-39-23

4: That seems fine. of course, there's the deeper problem of why these aliases are being used at all, but anyway, at least we don't break the site so often. This page, too, could use some padding love; see screenshot also. 8: Please go ahead.

Regarding making refs consistent, yes, this should be done.

ghost commented 8 years ago

With regards to 3, I did not do the results page you have in the screenshot, but the definition pages itself (f.i. https://suttacentral.net/define/ul%C5%ABka) : I only changed the padding on the right though ... I did not look at the bottom padding ... will get back to that or am I looking at the wrong page?

sujato commented 8 years ago

No, that's looking good, I am as confused as usual.

I'm just wondering though, should we not have a page template that does these things automatically?

ghost commented 8 years ago

We have a media-responsive settings file, but in theory all such things need to be set for different screen widths for all elements on all pages with a mobile-first approach. I guess this was not done for the search pages. I will get back and look at the results page too, and check things on mobiles and tablets, ipad, kindle and whatever else I can find floating around here.

sujato commented 8 years ago

I mean, since the search results and the "no definitions found' pages are simple one-column pages, should there not be a universal container that applies to all such pages? I apologize, I wrote the CSS for this, but it was so long ago i forget how it works. The responsive behavior of these pages should be identical with that of a regular sutta text page, so we should be simply applying one class to cover all these, and not having to worry about adjusting each one. (The dictionary results are a little different, as it is two columns.)

ghost commented 8 years ago

OK. I'm confused. For the sutta pages, the padding actually disappears with smaller screensizes so it takes up the whole width of the page. It is actually the element "sutta" that changes, not the main page wrapper. I suppose there is a reason why the individual elements were used. So yes, instead of starting to set individual items, maybe we have to look more carefully at what the desired behavior is on various pages and if this can all be changed to just the padding of the "main".

sujato commented 8 years ago

The padding shrinks on sutta pages, but doesn't disappear. I think it's like 5% or something.

sutta_vs_dictionary

sujato commented 8 years ago

On a not very relevant note, but one of the design principles I keep coming back to; make everything vertical! Seriously, as soon as you try to stick things side by side, it makes responsive design a million times harder. Sure, you can do it, and it's not very hard. but you have to keep adjusting and tweaking, fixing things, checking them. When everything lines up nice and vertical, it all just works on any screen size, you don't have to think about it.

ghost commented 8 years ago
@media screen and (max-width: 480px)
.sutta {
    padding: 0!important;
}

Padding certainly disappears.

But what you have on those pages is the margin on the article left, which has become smaller, but is still there.:

@media screen and (max-width: 480px)
#text article {
    margin: 2em 1em;
}

The search pages don't have an article. But I can make something similar there.

ghost commented 8 years ago

Point 8 is Done so aṃsa is now grouped correctly. Point 4: I did not just change the links to include the diacritical marks. My reasoning was that if there is an automatic way to change those links, there should be a js or py dictionary that tells you that aa = ā, etc. But I have searched all over and cannot find such a dictionary. So therefore I judged it safe to change the links. It can always be turned back fairly easily. Point 3: I changed the layout of the search-results page, the dict-definition page and the "not found" page. If this is OK now, I will test it on all the devices I can find here.

sujato commented 8 years ago

Great, thanks, this is much better.

Can i ask for one more detail? See above re "responsive design is hard when you have horizontal elements". At mobile sizes for the search results page, can we put ul.results li .type, .advanced-search-link, label, #page-main-search>button:last-of-type{display:none}? If you're going to be testing mobiles, you can check this out, see if it works okay.

ghost commented 8 years ago

I must admit that that looks cool! You've got a much better idea of aesthetics than me. Tested on Kindle Silk browser and all works fine there, horizontally and vertically (but not small enough to test the disappearing items). Also tested on my old galaxy mini. Search function does not work but I got to the page by just typing in the URL. There it worked beautifully, both horizontally and vertically. The only problem I noticed was in the dictionary pages: when the Galaxy is in the vertical position, the 2 columns start to overlap. Similar to attached screenshot. Guess this happens on all small screens. Maybe we have to add another breakpoint for that and make the first column disappear or have it disappear for all mobiles? screenshot from 2016-08-30 12-42-44 Will do some more tests on different devices when I get hold of them.

sujato commented 8 years ago

I thought this had been fixed, obviously not.

At still smaller sizes, the "Adjacent terms" in the left column shifts to below the entries.

sc_dict_screen_mobile

So this is the correct behavior, except that when this happens, the entry itself needs to be aligned left with the same margin/padding as the 'Adjacent terms".

So i would suggest do two things:

  1. make this happen sooner.
  2. adjust left margin
ghost commented 8 years ago

Interesting. I can't simulate that behavior in my browser, neither Chrome, Chromium or Firefox ... so it becomes a bit difficult to test out.

sujato commented 8 years ago

Umm, maybe zoom the screen size?

Anyway, we can try something like this, applied to the mobile screen size:

.related-terms {
    margin-right: 2em
}/*this makes the related terms float down, expand if needed*/
.define .entries {
    float: left;
    margin-left: 1rem
}
.define h1,
.define h2 {
    float: left;
    margin-left: 1rem;
}

.entries {
    width: 100%
}
ghost commented 8 years ago

Of course I tried the zoom. But it needs a bit more work than that because .entries is already defined elsewhere. I'll have a look at it tomorrow. I'm off now. Tested also on iPad ... works fine. I think we can leave the tablets further and focus on the smaller screens.

sujato commented 8 years ago

Actually I just checked firefox, and it doesn't float down there for me either.

Maybe it's best to simply whack the "related terms" with a "display:none" at mobile sizes, and adjust the entries accordingly.

ghost commented 8 years ago

Ok. Check now. I did it a bit different and simpler from what you suggested. I did not touch the related terms at all because they are automatically pressed down once the entries expand to the whole page width. Also changed other settings a little to accommodate for this because it has to be the last setting to load or it is overwritten by others. I checked this on my Galaxy Mini and it works fine there as well as on all the browsers.

sujato commented 8 years ago

Looks great, thanks so much.

ghost commented 8 years ago

I had a test on an iPhone on iOs 9.3.4 and although the css works fine, the search function did not work (similar to my galaxy mini). I.e. click on the magnifying glass and nothing happens, while it should expand to a search box at the top.

ghost commented 8 years ago

The other points: 9 is donkey-work and this donkey is working on it. I have done "a" sofar. See for instance https://suttacentral.net/define/a 6 and 7 I will not do. Somebody really has to go over it with a fine tooth comb and I think Regex can potentially create a big mess in this (but you are welcome to try). 2 - I can probably do a regex to mark up the other dicts the same as DPPN. But that does not mean they are links. I think this is something to look at with the jsonification of the site and ties in with what we discussed here: https://discourse.suttacentral.net/t/display-strategies-for-new-data-tables/3005/41?u=vimala. This list could possibly be used to extract the correct reference but as Blake is working on this, it might be something to keep in mind. 1 is not as simple as it sounds. For instance, if you look at https://suttacentral.net/define/akkh%C4%81ti, there are several terms in bold (verb conjugations). Some of which exist as a separate entry in another dictionary, but most are not. So marking up the <b> is not a very good strategy.

What I will do is to at least make the references in all dictionaries the same layout and mark them up the same.

sujato commented 8 years ago

For number 9, this is great, thanks, if you want to do it.

for 6 and 7 I agree, this is large and thankless task. I have done some of it as a trial, but didn't get very far.

For 2, marking up the references in the PTS dictionary is no easy matter. I'd recommend not trying it at this stage. When we're ready, we can linkify the refs in DPPN, that should be achievable.

As for 1, could we not extract a list of bolded terms, filter it against terms found as headwords in the dictionary, and linkify only headwords?

ghost commented 8 years ago

For 2: It is fairly easy to filter the refs with regex and put a span around them as in DPPN. It might miss a few but will get most of them.

for 1: hmmm ... I suppose that could work. I will give it a try. After the donkey work of 9 is done.

sujato commented 8 years ago

2 isn't as easy as that. You constantly have cases like D ii 44, 56, which need to be parsed as D ii 44, D ii 56. These kind of abbreviations happen very frequently and in many different ways. This is, of course, in addition to the many other kinds of inconsistencies and errors. There are tens of thousands of references. I worked on DPPN for several months, and, dealing with similar issues, the IB Horner Vinaya translation, also for several months. If I was going to do this in PTS Dict, I would allow, for this task alone, 2-3 months full time work.

ghost commented 8 years ago

2-3 month ... maybe stay on your island a little bit longer :-)

sujato commented 8 years ago

don't tempt me …

ghost commented 8 years ago

And why not? Or we swap: I'll do it if I can stay on your island and you can build kutis in Belgium. :-)

I've done the Greek so you can cross out 9. And to think I did everything to get out of Latin and Greek in school ... but interesting to see the etymological similarities between the various languages.

With regards to 2, I've started doing this for the Dhammika. Not entirely finished yet. I've done most with regex now and still need to finetune by hand. Have a look at f.i. https://suttacentral.net/define/agaru

sujato commented 8 years ago

Wow, the Greek is awesome, thanks so much, i would have thought it would take much longer.

The Dhammika also looks good.

Are you working on the PTS right now? because I think I can make a few regexes that will improve things, without going too deep. Actually, following my previous comment, I am working on the PTS dict now, so please hold off any changes to that. Dear God, this dictionary is such a steaming pile of shit it's an embarrassment to us all. A list of changes:

After claiming that it would take months to fix the refs, I then proceeded to prove myself spectacularly wrong by doing much of it in an afternoon. I've created 114,873 <ref> tags, with corrected vol/page style. Many remain; I used regexes to fix "subsequent abbreviations" such as "MN i.23, 56" to "MN i.23, MN i.56". But I struggled with references that inserted random things between one ref and the next , such as "MN i.23 (id.), 56 …". Perhaps you'll have more success; in any case, I would guess over 95% of references are done, the remainder is the long tail.

The secret to regex success, I found, was using a (fairly!) complete list of reference IDs. Here it is, in case it's of any use.

MN-a|DN-a|SN-a|AN-a|Dhp-a|Snp-a|Ja-a|Vv-a|Pv-a|Mil-a|Ud-a|Iti-a|Kp-a|Pts-a|Thag-a|Thig-a|Cp-a|Bv-a|Vin-a|Vb-a|Dhs-a|Pp-a|Nd-a|Vism-a|Ne-a|Bv-a|Cp-a|MN|DN|SN|AN|Dhp|Snp|Ja|Vv|Pv|Mil|Ud|Iti|Kp|Pts|Thag|Thig|Cp|Bv|Vin|Vb|Dhs|Pp|Nd|Sdhp|Dhtp|Lal|Divy|Mvu|Pgdp|Vism|Ne|Avs|Mpt|Bv|Dhtm|Dāvs|Abhp|Cp|Dpvs|Mhvs|Jtm

And here is the updated file. I won't do any more work on it; I allowed a day to fix the worst bits, and that's all. Before uploading a couple of xml-style tags need fixing: <term> should become <i class="term"> and <ref> to <span class="ref">.

pts.html.zip

ghost commented 8 years ago

Oops .. did not see your post because I only looked at my email and that did not have your alterations in it. I just did 1. i.e. linkified around 5000 of the bold items and uploaded that to the server.

Just had a look at your file and bit unsure what to do now. Some of the former bold terms that are now italic can be linkified. Do you still want this? The old version pts dict with linkified bold terms is now active on the server. I will replace this with your version but please let me know about the linkification.

But anyway: Good job!!

Dhammika is done marking up refs. Basically did the same as you with a similar list (to be found at the bottom of the Dhammika) but then went over the whole thing by hand to check because the Dhammika is not so big.

ghost commented 8 years ago

Ok .. I thought "What the heck .. I'll add the links and upload it". But then I discovered that there are errors in the new pts file you sent and it refuses to load. I've spent hours on it so need a break. Will get back on it tomorrow.

sujato commented 8 years ago

Synergy! I should have waited, sorry about that.

Anyway, the words marked <term> should be almost identical with the previous <b>, except for a few corrections. So yes, please linkify where possible.

I have checked a number of entries in the current linkified dictionary, and they mostly work well. A couple of points i've noticed:

  1. Sometimes an arbitrary part of a work is linkified, it should be whole words only: https://suttacentral.net/define/tu%E1%B9%AD%E1%B9%ADha
  2. Sometimes words that should be linked (i.e. they are dictionary headwords) are not linked, eg. https://suttacentral.net/define/p%C5%ABti
  3. Quite a number of words marked <term> are hyphenated. I would suggest that in such cases, check first if by removing the hyphen we can match with a headword; then see if the elements divided by hyphen match.

I'll be interested to see what errors you find: I did run Tidy over it, so I'm not aware of any problems on my end.

ghost commented 8 years ago

The issues you saw with the linkified terms I also saw and they were corrected in the next version (but that did not want to load yesterday).

I found the fault in what you sent:

<meta name="source” content = “pts_ped">
<meta name="priority” content = “10">
<meta name="root_lang” content = “pi">

Spot the quotemarks ...

So now it is loading. Just a few questions with regards to the consistency of the markup:

  1. Your version reads: ref{background:#ddd;padding:0.1em 0.3em; color:#555;} while the Dhammika and DPPN read: ref {color:white; font-size:0.6em;background:#333;margin:0 0.5em;font-weight:bold;padding:0 0.5em;vertical-align: super} So they are slightly different.
  2. Where references follow each other you retain the , or ; (this in itself is not consistent) between the various references and the Dhammika and (part of) DPPN do not have this - they only have a space.
  3. The Dhammika and DPPN have a markup like Vin.i.238, while you have Vin i.238 (without a . in between Vin and i)

So please let me know for each of these points what you prefer or if it is all the same to you as long as it is consistent.

sujato commented 8 years ago
  1. ignore the CSS in the file, it is only for convenience. We can check how it works on the site itself. (In fact I have some ideas for improving the typography of dictionary entries, but it's only a few tweaks.)
  2. This is a tricky point. The various ; and . are in fact meaningful. They structure the lists of references. In a previous version I eliminated them, but later realized this was a mistake. Having said which, there are many errors. But still, I'm not sure of any other way of structuring the references in a comparable way, so I left them in. Let's leave them for now and see how.
  3. The important thing is that all the dicts are the same. The only reason I used the dotted Vin.i.238 form was to reduce errors (spaces are tricky!). Whether it makes any difference, I don't know, but that was the reason. Typographically, the best form would be Vin i 238. But maybe make everything dotted for now, we can tweak later when we're happy with it?
ghost commented 8 years ago

Updated the last point. Would it be an idea to close this item now and create a new one with just the bits that are left to do?