Closed basil closed 1 year ago
I think something like this would need to be streamlined. For example, you feed a program a PDF, it converts it to images and uses OCR to recognize the neumes and the text. Remove the "extra" melismatic vowels and detect half-words that are sometimes repeated in papadic and sticheraric hymns, then using something like Morphologia Graeca to determine which syllables are stressed.
You might check on Psaltologion, especially in the Greek section because I think something like this is already being undertaken (not the AI part, but the catalog of neumes and syllables), although I'm not sure whether it's publicly available.
For OCR, there is Gamera, but I've never used it. I think the author of Melodos made his own OCR algorithms.
Let's keep AI out of this...
I'm trying out Github Discussions. I'm going to move this Issue over to the Discussions sections and keep the Issues section full of concrete, actionable work, and the Discussions section more of a place for developing ideas that can eventually become actionable issues..
The most obvious way to integrate AI with this project would be to train a model with the classical repertoire and use this to autocomplete formulæ based on the first few neumes during score entry. The autocompletion could take into account information like the current mode/scale, but this would not be useful for composition due to a lack of awareness of the underlying text.
A more challenging application would be to integrate AI for automated composition. If large language models can be used to generate plausible paragraphs of text, it seems plausible that the same technology could be applied to generate plausible compositions. Consider the case of a model trained with the classical repertoire attempting to generate a plausible composition for a Greek liturgical text. For the result to be plausible, the model would likely need to include information not only about the music but also about the accentuation patterns of the lyrics. Similarly, the input text may need to be tokenized from words into accented/unaccented syllables. Papa Ephraim's binary representation of 0 (unaccented) and 1 (accented) may not be sufficient, either—much of the musical rhetoric in a composition depends on the rhetoric of the text. For example, imperfect musical cadences often correlate with the end of a clause within a sentence, while perfect musical cadences often correlate with the end of a sentence. In other words, the model would likely need to account for both musical and textual semantics.
The most challenging application would be to generate plausible compositions in a novel target language, like Romanian or English. All of the above challenges apply as well as a new one: mapping the textual semantics of the original language to the textual semantics of the target language in the model. ChatGPT can do automated translation, so perhaps there is already a solution for this.
As an aside, AI support could be a possible way to monetize this software—retain the open-source version for basic usage and develop a proprietary SaaS-based AI extension for revenue generation. (I am not sure if the GPLv3 allows this, but the project could always be relicensed to a different open-source license if needed.) I am not sure how big the market for this is, but it is certainly something I would be willing to pay for.