translation_support - Githubissues

sharkski commented 2 years ago

by using googletrans package, we can translate input text and then look at the word frequencies

joeyagreco commented 2 years ago

Thank you for opening an issue to discuss this potential feature.

So the use case for this (I assume) would be:

User has a file that is an a language (say French).
User would like to translate this file (lets say to English) and get the word frequencies for that file.

I see this as 2 separate things.

Translate file
Get word frequencies for file.

If a user would like to translate text into another language, it seems the googletrans package could do that without the help of this library.

I don't think that translating the text within the given file fits into the scope for this library.

If a user would like the above stated functionality for a project, they could easily translate the text before feeding it into this library.

If there is something that I misunderstood please correct me.

sharkski commented 2 years ago

Thank you for the quick response.

I can see your line of thinking, but I disagree.

You say a user can "easily translate" the document, which first of all is not a safe assumption to make of a users tech literacy level, given that you probably want people outside of developers to use this library.

But even further, the idea of this library, as I understand, is to allow for some higher level analysis with the word frequencies generated, and I believe that analysis across languages is key. I think the standardization of input should come when the data comes into the app, if the data is to be used for analysis. Say a user wants to compare the Greek, Ukrainian, and Russian language bibles. Whats the best way to compare? They will have to be translated at some point for analysis. And if a user has to do the translation for each file, any time they change it, then it's almost not worth the frustration to use the app, even for Wordfreak.

I used the translation service to figure out how to fix an old Ukrainian poem into a version with the most accurate translation. Running Wordfreak I was comparing translations of the Ukrainian poem, I could see in some that 'the' was translated less or more frequently, a common problem between slavic-english translations

joeyagreco commented 2 years ago

You say a user can "easily translate" the document, which first of all is not a safe assumption to make of a users tech literacy level

You're correct in saying that the literacy level of any user should not be assumed, however, I believe it is a safe assumption that anyone capable of using this library, which requires a pip install to use, is also capable of using a pip install to use the googletrans library.

As far as how easy the googletrans library is to utilize, it seems to be well documented here with examples.

Making this library more accessible to those not technically inclined could be made possible through developing a front end web app for it, which would allow any user with a browser to extract word frequencies without having to code at all.

They will have to be translated at some point for analysis.

This point could be used for any external feature, no? You could say that text should be spelling/grammar checked before extracting word frequencies... perhaps you would be correct in saying that.

Does that make spelling/grammar checks the responsibility of this library?

I think that not including extra steps in the extraction is important to maintaining the purpose of the library, which is to provide a way to extract word frequencies from text as it is given.

If a user would like to include a spelling/grammar check (or any other formatting/logic) before extracting word frequencies, that should be done in a more focused, specialized script which utilizes this library for the extraction portion.

At a high level, I believe this plays into the Separation of Concerns design principle, which focuses on ensuring each portion of a program addresses a separate concern. I think this principle is critical to maintaining a modular design in any project regardless of the size.

You give some great examples of ways that this library can be leveraged. Perhaps creating a library geared specifically for finding the most accurate translation, translating the text, and comparing the texts would be a great way to leverage this library while adding in more specific and very useful features.

joeyagreco / wordfreak

translation_support #1