sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

Hugo static website generator and github #337

Open drdhaval2785 opened 3 years ago

drdhaval2785 commented 3 years ago

I recently digitized a dictionary अभिधानरत्नमाला of हलायुध।

As a process, I generated xml, html, json, md, stardict process.

One important advantage of .md file is that it is amenable to static site generators.

Because the dictionaries are static content, gave it a try to generate a dictionary via static site generator Hugo. The output is available at https://sanskrit-kosha.github.io/

For a specific entry, one can use https://sanskrit-kosha.github.io/{dictcode}/{headword} e.g. https://sanskrit-kosha.github.io/armh/ह्लादिन

Kind request to check the frontend and let me know the feedback.

@funderburkjim , @gasyoun and everyone involved here, your feedback would be very useful.

Note on themes

The theme used is https://themes.gohugo.io/beyondnothing/ . It works fine for our purpose. We can tweak some templates further if required. We can use any of the themes available at themes.gohugo.io/ . Because the hugo version on ubuntu apt is 0.40.1, I could not succeed on many fancy themes. But it should be able to compile hugo from source and get hugo 0.80.0 and try some more fancy themes.

gasyoun commented 3 years ago

Kind request to check the frontend and let me know the feedback.

It works and I like it. https://sanskrit-kosha.github.io/armh/%E0%A4%85%E0%A4%99%E0%A5%8D%E0%A4%95/

Can we try it for Cologne as well, @drdhaval2785 ?

funderburkjim commented 3 years ago

I like the idea of static sites, since they can be hosted anywhere -- even locally WITHOUT xampp, as well as at Github.

To 'mimic' the current Cologne displays, one obstacle involves transcoding.

That is, we want to allow input/output options for sanskrit words: slp1,iast,hk,itrans,devanagari

A crude approach would be to code N*N separate versions of each page (e.g. N=5 choices of input, and 5 choices of output).

But static sites can also use Javascript.

And it is possible for Javascript to do the transcoding -- one approach of this is actually implemented in the current List display (webtc1). And I know Dhaval has proposed a Javascript transcoder (forgotten where), that might also work (although it may not handle accents, not sure).

Thus, we could generate a single page with the html parts to transcode being marked with a tag () which the static page Javascript would transcode when generating the final form that the user sees.

funderburkjim commented 3 years ago

Re Hugo, markdown:

Hugo has a good rep among static site generators.

It seems to be easy to install on local machines.

Not yet able to tell whether it is a good choice for Cologne sites.

Would like to see discussion of how https://github.com/sanskrit-kosha/sanskrit-kosha.github.io was generated.

gasyoun commented 3 years ago

And I know Dhaval has proposed a Javascript transcoder (forgotten where), that might also work (although it may not handle accents, not sure).

http://aksharamukha.appspot.com/converter is lovely. It works on https://sanskritdocuments.org/sanskrit/purana/

A crude approach would be to code N*N separate versions of each page (e.g. N=5 choices of input, and 5 choices of output).

I would want Dhaval to start with the crude approach, because it has his own benefits.

drdhaval2785 commented 3 years ago

Would like to see discussion of how https://github.com/sanskrit-kosha/sanskrit-kosha.github.io was generated.

https://gohugo.io/hosting-and-deployment/hosting-on-github/#step-by-step-instructions

Followed these step by step instructions.

drdhaval2785 commented 3 years ago

I did not want this experiment to clutter the sanskrit-lexicon.github.io as of now. So, started testing on my sanskrit-kosha.github.io platform.

  1. Generated MD files from XML for SNP. Seemed to be the smallest dictionary to experiment.
  2. After generating the MD files, placed them inside the content folder https://github.com/sanskrit-kosha/kosha-hugo2/tree/main/content
  3. ran ./deploy.sh "commit message". This script generates the hugo pages and pushes the public repository (where the autogenerated HTML is stored) to https://github.com/sanskrit-kosha/sanskrit-kosha.github.io .
  4. After 1-2 minutes, the pages are live on https://sanskrit-kosha.github.io/ .
  5. Please check https://sanskrit-kosha.github.io/snp/हरीतकी/ for example.
  6. I selected Devanagari, because the site generator makes all the capital letters small. That would be a problem in our case. 'kamala' and 'kamalA' need to render different pages, but would mess up. No such problem in Devanagari / IAST. IAST is difficult to type on a keyboard. Devanagari seems relatively better.
gasyoun commented 3 years ago

site generator makes all the capital letters small. That would be a problem in our case.

It sure would.

IAST is difficult to type on a keyboard. Devanagari seems relatively better.

Disagree. See https://www.yesvedanta.com/keyswap/

drdhaval2785 commented 3 years ago

Quick and dirty script to generate markdown files from Cologne xml file is available at https://github.com/sanskrit-lexicon/COLOGNE/blob/master/makemd/make_md.py

funderburkjim commented 3 years ago

minor link problem

https://github.com/sanskrit-kosha/kosha-hugo2/tree/main/content

In 'snp' subdirectory, e.g. in अक्ष.md, the Cologne link does not work. I think all that is needed is to add &input=deva in the getword link.

Also, you probably want to put '&output=deva' also in the link.

Another aesthetic suggestion is to put the dictionary title somewhere on each page.

ARMH2 looks quite nice in md:

https://github.com/sanskrit-kosha/kosha-hugo2/blob/main/content/ARMH2/अंशु.md

slightly different in html:

https://sanskrit-kosha.github.io/armh2/अंशु/

Incidentally, what are the TWO 'headwords?' आदित्य and रोचिस् under अंशु: . Are they the names of two wordlists in which अंशु: occurs? Please provide a minor introduction regarding this kind of 'thesauras'.

gasyoun commented 3 years ago

Another aesthetic suggestion is to put the dictionary title somewhere on each page.

Agree.

https://github.com/sanskrit-kosha/kosha-hugo2/blob/main/content/snp/%E0%A4%B6%E0%A4%BE%E0%A4%B2%E0%A4%AA%E0%A4%B0%E0%A5%8D%E0%A4%A3%E0%A5%80.md

The cologne link: शालपर्णी to https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/getword.php?dict=snp&key=%E0%A4%B6%E0%A4%BE%E0%A4%B2%E0%A4%AA%E0%A4%B0%E0%A5%8D%E0%A4%A3%E0%A5%80 does not works. It's not found: 'शालपर्णी' (slp1 = शालपर्णी)

drdhaval2785 commented 3 years ago

minor link problem

corrected. @gasyoun suggestion at https://github.com/sanskrit-lexicon/COLOGNE/issues/337#issuecomment-764921565 was also related to the same. Now it is functioning.

drdhaval2785 commented 3 years ago

I gave hugo a try. It almost hangs for a dictionary of the size of WIL i.e. around 50000+ entries. So abandoning hugo experiment for Cologne dictionaries. Sanskrit-Sanskrit koshas are relatively of smaller sizes. So will continue using hugo for my own projects.

gasyoun commented 3 years ago

So abandoning hugo experiment for Cologne dictionaries.

Maybe I should redo on my PC?

It almost hangs for a dictionary of the size of WIL i.e. around 50000+ entries.

Hmm, it did look fun.

drdhaval2785 commented 3 years ago

Problem

  1. Hugo regenerates all static pages. For dictionaries like MW with 200k+ headword entries, this would take eternity.
  2. git has difficult time finding status and adding the files in such a bulk. When I tried to add WIL files, git ran for quite some time without any output.

Proposed solution

  1. Find out a way in which Hugo can be run on specific set of files.
  2. For the initial setup of new dictionary, we can generate static HTML pages in batches of 1000 or so. Hugo generates 1000 pages in around 1 second or so.
  3. Git should not have much problem adding, committing, pushing such 1000 pages lot.
  4. For maintenence, findd out a way in which hugo regenrates HTML for only those entries which are modified. If one entry in a specific dictionary is updated, we only need to regenerate HTML for that entry. Finding out changed entries should be trackable by git. In that case, hugo would have to generate 5-10 changes per day only.

Once we figure out such an arrangement, hugo should be fun for Cologne. Need to take some hugo tutorials and read CLI usage.

Reopening.

funderburkjim commented 3 years ago

https://sanskrit-lexicon.uni-koeln.de/scans/csl-apidev/getword.php?dict=snp&key=%E0%A4%B6%E0%A4%BE%E0%A4%B2%E0%A4%AA%E0%A4%B0%E0%A5%8D%E0%A4%A3%E0%A5%80&input=deva

This is found. getword needs to be told that the spelling is Devanagari (input=deva)

gasyoun commented 3 years ago

Once we figure out such an arrangement, hugo should be fun for Cologne. Need to take some hugo tutorials and read CLI usage.

Agree.