itkach / aard2-android

Aard2 for Android, a simple dictionary app
GNU General Public License v3.0
464 stars 98 forks source link

Add pronunciations #49

Open rominf opened 8 years ago

rominf commented 8 years ago

I propose to add support for pronunciations into Aard to make it easy to learn new words by hearing them.

I see 4 choices how to achieve this:

itkach commented 8 years ago

I propose to add support for pronunciations into Aard to make it easy to learn new words by hearing them.

I see 4 choices how to achieve this:

Include audio into the slob format. Implementation: format needs to be changed.

It doesn't. Slob can contain data of any type.

Audio can be quite heavy. Pronunciations are not really connected to dicts themselves. Non-orthogonal solution.

It kind of is. Context of where audio appears is important, and actually correlating some abstract set of audio data for pronunciations back to "words" is rather non-trivial. Which pronunciation variant is it? Which language, which meaning, are there spelling variants?

Use separate audio-dictionary. Implementation: archive of audio files, multi-track audio format (like mogg), brand new format?

It's possible to package media (or any content type) to package into a separate .slob and treat it as if it were part of some other dictionary (well, almost) - by specifying uri tag in .slob that matches some other dictionary's uri.

Audio can be quite heavy. Use online services. Implementation: use API or crawl sites.

"Crawl sites" means download audio resources and include into dictionary. All "API" that's needed though is an http link to audio file.

Use an external program. Implementation: easy. UX: depends on the program, additional click.

Which program for example?

Audio is really not much different from images, just another resource that is identified by a URL, dictionary-local or otherwise. Perhaps a good start would be to compile a Wiktionary without filtering out audio-related markup and see if we can get audio to play using Wiktionary's online hosted audio-files.

rominf commented 8 years ago

It doesn't. Slob can contain data of any type.

Cool! Didn't know that. I had a wrong assumptions what Slob format is. Now, after reading https://github.com/itkach/slob I got better understanding.

Which program for example?

https://play.google.com/store/apps/details?id=ru.o2genum.howtosay

It's possible to package media (or any content type) to package into a separate .slob and treat it as if it were part of some other dictionary (well, almost) - by specifying uri tag in .slob that matches some other dictionary's uri.

OK, I like the idea of using separate Slob with audio for dictionaries. Didn't know that .slob files can be connected via uris.

Perhaps a good start would be to compile a Wiktionary without filtering out audio-related markup and see if we can get audio to play using Wiktionary's online hosted audio-files.

That would be cool. Another thing that could be done with Wiktionary is to crawl it like this (pseudocode):

audio_dicts = {}
for lang in wiktionary.langs:
    audio_dicts[lang] = Slob()
    for word in wiktionary[lang].articles:
      audio_dicts[lang][str(word)] = generate_article(extract_audio_files(word))
for lang, audio_dict in audio_dicts.items():
    audio_dict.save('wiktionary_{}.slob'.format(lang))

The biggest problem I see is an inconsistency of paragraphs of pronunciations between different languages (see https://en.wiktionary.org/wiki/test and https://ru.wiktionary.org/wiki/%D0%BF%D1%80%D0%BE%D0%B2%D0%B5%D1%80%D0%BA%D0%B0 for example). I think that probably the easiest way to get around this is to wait until Wikimedia pushes Wiktionary's words into Wikidata.

qnga commented 4 years ago

Is it possible to include audio into slob files making them accessible over HTTP but not by any key? I can see in xdxf2slob that CSS and JS resources are actually included with a key starting with "~/". Is this the only way? How does this work? I have difficulties in understanding this from the code bases.

itkach commented 4 years ago

Is it possible to include audio into slob files making them accessible over HTTP but not by any key?

not sure what you're asking

slob is a simple key-value store, text for keys and arbitrary bytes for value along with content-type specifying how to interpret the bytes. In aard2 (android and web), content is served by a customer embedded web server that interprets request urls and translates them into keys to look up. "~/" is just a convention, there's nothing special about it. You can type ~/css in lookup and see list of all resources starting with that key across all the dictionary.

qnga commented 4 years ago

Sorry, I was too elliptical. I would like to use locally-stored HTML resources in a dictionary, i.e. images and audio files that are not meant to be accessed as separate articles. The most natural way I can think of is adding them in the slob without any key and linking to them by blob id rather than by key. Is this allowed by both lib and Aaard's web server? I don't think so. Otherwise, I can use keys prefixed by "~/" and rely on them for linking. Is this the preferred way?

itkach commented 4 years ago

without any key and linking to them by blob id rather than by key. Is this allowed by both lib and Aaard's web server? I don't think so.

Right, there has to be a key.

keys prefixed by "~/" and rely on them for linking. Is this the preferred way?

Yes.

It's quite similar to regular web sites - images, css and javascript all have urls and are downloaded in the same way as the main html content, it's just that users typically don't type those urls in browser address bar or look directly at those resources. But they can.

qnga commented 4 years ago

Okay, I understood keys more as entry points rather than URLs. However, a list of all contents is actually never showed in Aaard, and an user is unlikely to search words beginning with ~/.

Thank you for this clarification.