piotrbajdek / lngcnv

linguistics: display pronunciation, translate between dialects, convert between orthographies; support for multiple languages: English, Latin, Polish, Quechua, Spanish, Tikuna
https://crates.io/crates/lngcnv
MIT License
18 stars 1 forks source link

Will separate data and code be a good idea? #3

Closed leavelet closed 2 years ago

leavelet commented 2 years ago

Hi! I am a undergraduate student looking for good linguistic tools, and thank you for all the work! And I have some suggestions: when looking into the codebase, I found all the code in one file and the strings are hard coded in the main file. Will separate data and code be a good idea? It will make it much easier to maintain the code. Using "for" statement will also make the code easy to read, as well. I am writing a linguistic tool called phoneme_from_word which is in early development, you can check that on my page.

piotrbajdek commented 2 years ago

Hi, thank you for your comment! Any thoughts or feedback are highly appreciated.

A short answer is: Yes, you're probably right! But, I feel that making this good change may require deeper planning than it looks like at first glance. At present, I need to focus on purely linguistic aspects of the program, and once lngcnv does a useful linguistic work, I'll certainly see how to optimise the source code and make it look better, hopefully by the end of 2022.

A long answer is: You're actually the 4th person commenting on my program and the 4th one saying the exact same thing! Your suggestion sounds reasonable, and thank you! I've already exceed ten thousand lines of code and this all will keep growing..., hard coded in the main file. One person kindly forked an earlier version of my program (named lng-cv, which I first wrote in Pascal and then rewritten in Rust as lngcnv) in Python to show me how I can use .json files: https://github.com/tomek-siuda/lng-cv-python

However, lngcnv does not have a definite structure, but instead it's a long-term project which I'll keep developing the rest of my life, literally. At the very beginnig the algorithms did simple replacements of characters so that all the data could be easily organised just as shown in lng-cv-python, for example. But, as I keep developing it (and also, as I keep learning to code in Rust) things get more complex. For example, for certain languages, I need to concatenate strings to mark the word beginning or ending, and once I develop more intelligent ways to detect word stress (or tone) and subdivide words into syllabes, etc., I'll likely need to introduce more complex operations and see how to organise this all. My program offers orthographic and -in particular- phonetic modes of action for multiple dialects of six different languages belonging to three language families [languages I speak or learn to speak]. I can certainly subdivide lngcnv into more files but still as each natural language is structurally unique, in future it will be technically imposible to employ in lngcnv a single transparent format for encoding everything, so each file would be completely different.

Once v1.6.0 is released (already in alpha), I'll see how to organise all this a better way, perhaps in some kind of modules or separate files. [First, I need to release coprosize v1.0.0] But also, I want to avoid complexity, keep the code as simple and stupid as possible (even if looking very ugly for programmers), and see how it all develops.

Thanks for the info on phoneme_from_word--I'll take a look at it, and see if I possibly can give you some feedback.

leavelet commented 2 years ago

Thank you very for your reply, and I think you are right. Sorry for my recklessness. I am more of a programmer, thinking in programmer way. But I now realize that making it work is more important than beautifying the code. As for my project, It's just a toy to help me with homework. I think you don't need to waste your precious time looking at it. After all, thank you for your contribution!

piotrbajdek commented 2 years ago

No need to be sorry! Your viewpoint is perfectly valid--I've been just more focused on linguistics than programming which was essential during early development. I've been thinking of the problem though... and will soon make some changes:

The file main.rs will essentially manage the control flow and some other core logic of the program (which will soon get more complex than it is at present as I want to introduce certain enhancements). Whereas all the 'linguistic logic' will be moved to lib.rs--this will be a single large library, exceeding ten thousand lines of code, but still will allow me to partially avoid code duplication and, briefly speaking, will offer me some other benefits without affecting my possibilities of string manipulation, future improvements and introducing new exciting features. :)

I'm not sure if this is what you originally meant but it makes sense to me for now, so I'll close this issue once the changes are implemented (possibly in lngcnv v1.6.0-alpha.3 in mid/late-July).

piotrbajdek commented 2 years ago

Done in v1.6.0-alpha.3

piotrbajdek commented 2 years ago

In v1.6.0-beta.4 the source code is subdivided into 9 files and largely rewritten. Also, code duplication is avoided as much as possible.