Closed ghost closed 2 years ago
This makes a lot of sense, at least in a Wiktionary context. I'm not super convinced that adding yet another flavour to the existing three (mini/nopic/maxi) would do us any good. The question at this stage would therefore be: could we tweak mwoffliner so that it produces wiktionary files that include the .ogg files in maxi?
Some Wiktionaries (e.g. French) might have >10 pronunciations per word. For example the word "maison" has 13 audio files.
As far as I know, only the French Wiktionary has so many audios because they have a separate section for Pronunciation and add audios in bulk using @lingua-libre [ www.lingualibre.org ]. Suggestions to reduce the size of .zim files with audios:
dafuq is wrong with these people. I'm francophone and can maybe pick three differences within these twelve. We need to parse and pick up one sound, otherwise we'll end up with tons of junk like the above.
Some automatic tools can be used to evaluate the pronounciation.
We can do so, mwoffliner can do it... but this is a bit subptil. We want the ogg
but not the video files for example. We should have a look in detail how tod o.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
@kelson42 Some Wiktionaries contain a huge number of pronunciations in .ogg format by native speakers. As an example, the German Wiktionary contains around 700,000 pronunciations.
The current .zim files of Wiktionaries usually contain only plain text and do not include the audio files.
Language learners in developing countries with no internet connection cannot access online websites to listen pronunciations. Wiktionaries include IPA transcriptions, but it is not enough. The actual pronunciation of a native speaker is very helpful.
Would it be possible to include pronunciations from Wiktionaries in .zim files offered by Kiwix ?
EDIT: _- A method to reduce the file size of .ogg format audios would be converting them into .opus format. That format conversion can reduce the size of audios by 60-70%.
Here is a screenshot of a German Wiktionary .zim file used on GOLDENDICT :
PS. 1) I use the German Wiktionary in .zim format with GoldenDict. I live in a remote village in South America. Thank you very much, really ! The Kiwix Project has saved me because I almost never have internet connection.
2) The German Wiktionary currently weighs 1.4 GiB. If pronunciations in .opus format are added, it would weight around 5-6 GiB.
3) The Wiktionaries with more audios are English, French, and German. If compression is used in audio files, the tradeoff would be reasonable. At most 3-4 GiB would be added to Wiktionaries in the main languages. Other languages as Spanish and Italian have fewer pronunciations, and it would be less than 1GiB to be added to .zim files.