sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

The 'early' digitizations #385

Open funderburkjim opened 2 years ago

funderburkjim commented 2 years ago

The many digitizations of sanskrit dictionaries by @thomasincambodia can be divided into three portions: Before the 'DFG-NEH Project 2010-2013' and during this DFG-NEH project, and digitizations after the DFG-NEH project.

There is a folder on the Cologne server which contains the original early digitizations as provided by Thomas. Currently this folder is located at update/orig.

For these dictionaries, the current digital form on the https://www.sanskrit-lexicon.uni-koeln.de/ website derives ultimately from these original forms, though the derivation is often circuitous.

14 of the dictionaries are represented here.

funderburkjim commented 2 years ago

summary

AP  10820608 28. Nov 2006  apte.all
BOR  3634722 21. Jul 2006  Barooah01-783-c
BOR  3637602 27. Nov 2006  boroo.all
BUR  3586043 17. Jan 2008  burnouf.all
BUR  3398143 16. Nov 2006  burnouf-c
CAE  3481328  7. Jan 2008  CAPPELLE.ALL
CCS  2304849  3. Jan 2008  ccs.all
GRA  4467060 27. Dez 2007  gras.all
MD   4483335 27. Dez 2006  mcd.all
MW 27730409  9. Jun 2010  MONIER.ALL  (Internal file date 30.11.04)
PD 20766014 27. Nov 2006  pd1-6
PD  5935771 19. Jan 2007  pd1-6c.zip
PW 18105108 29. Jan 2007  pw.all
PWG 42227087  1. Dez 2006  pwg.all
SCH  3024153 30. Jan 2008  SCHMIDT.ALL
STC  3802283 25. Nov 2006  stchoupak-c
WIL  7054822 28. Dez 2006  Wilson.all
WIL  2998352 26. Nov 2005  wilson-a.txt
WIL  6947682 22. Jun 2006  Wilson-c
Andhrabharati commented 2 years ago

Currently this folder is located at update/orig.

@funderburkjim is this also at the GitHub? and would you pl. give the full path to this folder?

gasyoun commented 2 years ago

14 of the dictionaries are represented here.

Date stands for last edited? But we do not know when created?

funderburkjim commented 2 years ago

url can be inferred from home url. Files can be retrieved with curl. Would prefer not to put full url here.

Dates are the file-system dates. Not sure of 'creation' date. Perhaps @thomasincambodia would know?

funderburkjim commented 2 years ago

Some (all?) of these files are in the cp1252 encoding. They may be converted to utf8 encoding by a simple python program. cp1252_to_utf8.py. For example

python cp1252_utf8.py boroo.all boroo_all_utf8.txt
Andhrabharati commented 2 years ago

I remember recently downloading the utf8 files from somewhere, after you mentioned them to be present.

gasyoun commented 2 years ago

Would prefer not to put full url here.

Email it?

funderburkjim commented 2 years ago

Have emailed.