morrisfranken / glyphreader

A deeplearning approach to classifying the ancient Egyptian hieroglyphs
MIT License
57 stars 15 forks source link

Some Readings of the Unknown Signs #1

Open D-K-E opened 7 years ago

D-K-E commented 7 years ago

Hey, first of all, very exciting project,
I had also read your thesis, not the ACM, but the one you had written for the university, i believe on the recognition of hieroglyphs.
I am an egyptologist by formation, and i thought i might help with some identification and if possible with some developpement. I am also working on a feature extractor for Manuel de Codage for applying text learning algorithms to ancient egyptian texts in the future. I think your project complements really well to mine. I will keep you up posted if you are interested.

Some Identifications:
I don't have time to take a look at all of them. But here are some of the immediate identifications that i could come up with. 090071 is the heka sign: Wb 3, 170.5-21, meaning Wörterbuch der Ägyptischen Sprache of Adolf Erman, volume 3, page 170, etc.

090101 is aHa, ship: Wb 1, 222.4-8
My first guess for 090103 would be: Hm.t: Wb 3, 76.16-77.19

090123 is either Y1 or Htp: Wb 3, 183.4-7
090128 looks like ab: Wb 1, 173.12-174.1
090166 is a simple H
090168 is probably Ssr: Wb 4, 547
090169 is probably a part of tm: Wb 1, 144.5
090194 consist of two signs, the one on the above is n, below is i
090233 is Xn: Wb 3, 384.2-3
090239 consist of two signs, one royal cartouche and wn:
090299 is a determinative as seen in: Wb 1, 225.15-226.5
090308 is a part of m the bird,:
090311 is probably a part of Hr the head:
090384 is n: Wb 2, 193.3-194.7

morrisfranken commented 7 years ago

Hi D-K-E,

Thanks for your comments! Seems like we have similar interests, what are you using to extract features from the hieroglyph?

Regarding the incorrect labels, these are part of the automatically detected hieroglyphs, and the labels are computed base on most overlap with the actual labels inside the Manual folder. The images found the Automated folder are generated by using a text-detection method to segment the hieroglyphs in an image, and here is still plenty of room for improving the automatic hieroglyph detection. I would like to add this part to the repository as well in the future, and I'm in contact with the author to make the text-detection algorithm opensource as well.

But in the meantime, if you are interested in creating an improved hieroglyph detection system, I might be able to help you with that.

D-K-E commented 7 years ago

Hey,
I started writing something for jsesh type inputs, that is the user would give a text encoded in jsesh to the program, and the program would spit out, a dictionary with some features regarding the text. The goal is to be able to exploit those features with ML algorithms afterwards. Though i say jsesh encoded input, it doesn't have to involve java, because jsesh as you might know, works with a special flavour of Manuel De Codage, and a MDC file is just a bunch of special characters plus Gardiner Sign notations.
I am mainly interested in transforming an image to jsesh file, which evidently involves hieroglyph detection. If hieroglyph detection and textual feature extraction can be automated, we could even go for machine translation of texts from there. Quick question, by text-detection, what do you mean exactly, what type of texts, and what do we detect in texts ? Does it involve hieratic/demotic as well, or is it just transliteration ?