Open davidak opened 1 year ago
A possible solution could be to leverage language detection from something like chardet.
They seem to have language auto-detect implemented for translating posts without specified language, so it should be easy to use it in this case as well. https://github.com/mastodon/mastodon/pull/19244
There could be a way to "flag" the language on the message somewhere (see in the timeline or in the message details what is the posting language), and to manually change this language by using the new Edit feature.
I have just noticed that I can change the language using the Edit button. But I still can't visually tell the difference.
As a mulilingual I would like this solved using language detection, because selecting it manually is a real nuisance.
It also doesn't help that default Mastodon doesn't display the post language anywhere.
Post editing helps when you notice it yourself after posting. When creating a post, it should be checked if the set language matches the detected. If not, show it e.g. by making the button red. If one often changes languages, the default language setting can provide "detect language".
I agree. I believe a lot of people actually forget to set the correct language when posting things. On a big instance such as mastodon.social, I limited the languages I see in the timelines to english and german and still see a firehose of toots in other languages.
I believe some language detection in the frontend, that nudges the user if the detected language does not match the selected (or ignored) language would go a long way making the timelines for everyone more enjoyable.
Not a solution for the web clients but given how many use apps this could simply be solved for a majority (?) if native clients just derived language from the onscreen keyboard used to write the post.
I see that mastodon is using libretranslate which has a feature to detect the language given some text. This could be a good starting point for a fix.
I'm seeing so many posts marked with the wrong language, the language filter feature is almost useless right now. Most of the time English is involved: either the post is in English marked as something else, or the other way around.
I point it out to people sometimes, and the answer is always that they had no idea it was a feature or they they don't know how to change it. Doesn't help that some apps (like Tusky) don't even provide an option to change it per post.
Some people told me Mastodon used to have language detection, but that it didn't work properly so the current system was put in place. Maybe we need a hybrid system. Language detection works a lot better when you limit the amount of possible languages.
Language detection was removed in https://github.com/mastodon/mastodon/pull/17478
In #21631 I propose a different approach to remedy the problem of forgetting to set the post's language correctly.
I've formed a habit out of this, but I agree with everyhing above, the filtering currently doesn't work as so many forget to set the language.
Some clients don't even allow setting the language, so some people give up on tagging them.
I have experimented with a Python (sic) library called Lingua for language detection on the Federated feeds. The results looked quite encouraging - especially if the detection is seeded with the users language spectrum on long posts, the language spectrum that they tag with, and possibly their client-side language selection.
I do not understand why earlier experiments at language detection have yielded bad results. It must have been a simplistic implementation that does not seed the detector and does not consider language clustering (people don't speak all languages at the same time, some languages like English tend to co-occur, some languages are very easy to detect, replies should consider the original posts language).
In fact it seems more likely to me that detection will work better than translate, and it's a dependent problem.
Steps to reproduce the problem
Expected behaviour
designe that prevents this issue
Actual behaviour
i just post in my set default language
Detailed description
Post editing helps when you notice it yourself after posting. When creating a post, it should be checked if the set language matches the detected. If not, show it e.g. by making the button red. If one often changes languages, the default language setting can provide "detect language". If it works 99% reliable, we could make that the default.
Specifications
Mastodon v3.5.3 WebUI