Open trosel opened 7 years ago
@kyegupov I asked around about this, but it may be worthwhile to host a dictionary file on github in its own repo. For posterity's sake, but also in case people want to work on updating it together.
For now, the dictionary is in human-readable format (YAML) in the repository: https://github.com/kyegupov/ido_web_dictionary/tree/master/backend/src/main/resources/dyer_by_letter
On 16 March 2017 at 23:04, Cale notifications@github.com wrote:
@kyegupov https://github.com/kyegupov I asked around about this, but it may be worthwhile to host a dictionary file on github in its own repo. For posterity's sake, but also in case people want to work on updating it together.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kyegupov/ido_web_dictionary/issues/1#issuecomment-287218756, or mute the thread https://github.com/notifications/unsubscribe-auth/AAepjPVKNzKNUZgfoxzQcAjyGYB-FhE8ks5rmb_5gaJpZM4MS87O .
-- -- Konstantin Yegupov
I wouldn't say that it is "human readable". It seems like a mix of YAML and XML here https://github.com/kyegupov/ido_web_dictionary/blob/master/backend/src/main/resources/dyer_by_letter/i/ai.yaml
Actually, mix of YAML and HTML. But it's pretty hard to choose a format which would be both easy for humans and machines to read.
On 30 July 2017 at 23:03, Cale notifications@github.com wrote:
I wouldn't say that it is "human readable". It seems like a mix of YAML and XML here https://github.com/kyegupov/ido_web_dictionary/blob/ master/backend/src/main/resources/dyer_by_letter/i/ai.yaml
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kyegupov/ido_web_dictionary/issues/1#issuecomment-318932970, or mute the thread https://github.com/notifications/unsubscribe-auth/AAepjGuhUbHW5pJBna4WEsPF564sdr5Qks5sTP2egaJpZM4MS87O .
-- -- Konstantin Yegupov
If you choose one or the other, you can run it through pandoc to convert it to virtually anything.
Thinking forward towards the future, perhaps JSON would be easiest to work with in apps and to search and display.
Thoughts?
JSON is strictly worse than YAML (and even TOML) for anything human-editable, the main reason is the lack of comments.
The motivation to use HTML here was that the sources are in HTML. If I had semantic markup, then XDXF or TEI would be preferable (although both are XML-based and that's a pain to deal with).
I have no "perfect" answer as of yet. There are simple formats that are too dumb, or "extensible" ones that are too complex.
On 31 July 2017 at 01:24, Cale notifications@github.com wrote:
If you choose one or the other, you can run it through pandoc to convert it to virtually anything.
Thinking forward towards the future, perhaps JSON would be easiest to work with in apps and to search and display.
Thoughts?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kyegupov/ido_web_dictionary/issues/1#issuecomment-318940770, or mute the thread https://github.com/notifications/unsubscribe-auth/AAepjL2oHiAFfwM0DecLaWFz0CvL74Naks5sTR6sgaJpZM4MS87O .
-- -- Konstantin Yegupov
Do you have a plan for what you want to use this for in the future (other than your current website?)
Not yet.
On 31 July 2017 at 13:49, Cale notifications@github.com wrote:
Do you have a plan for what you want to use this for in the future (other than your current website?)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kyegupov/ido_web_dictionary/issues/1#issuecomment-319057533, or mute the thread https://github.com/notifications/unsubscribe-auth/AAepjLFO5_l4AqGpb66gg3gUK-aM0Ar4ks5sTc1XgaJpZM4MS87O .
-- -- Konstantin Yegupov
I've imported a recent data dump from the Ido Wiktionary into an SQLite database; however, Wikimedia's formatting makes it nearly impossible to create an adequate parser so much of the data has been dropped or corrupted. I'm writing an API here: https://github.com/linguo-io/api and a basic front-end here: https://github.com/linguo-io/vortaro.
They are both very much unstable but I think this is a good path to go down as far as to what format we should store the dictionary files in so they remain maintainable and extensible without too much duplicated effort. Let me know what you think.
There's a link in the code. It's Dyer's dictionary from 1924 which has been updated by Brian Drake: http://www.ido.li/dicionarii/IdoAngladicionarii/
I actually am not sure where to find most "up to date" official dictionaries, I will ask around. Wiktionary is pretty comprehensive, but I'm not sure how reliable it is.