ykdojo / editdojo

(I'm no longer working on this - currently working on https://github.com/ykdojo/defaang)
https://www.csdojo.io/edit
MIT License
332 stars 98 forks source link

Automatically detect if the given text is Japanese or English with Python #23

Open ykdojo opened 5 years ago

ykdojo commented 5 years ago

I think I'm going to release the Twitter-based version of this product for Japanese and English first. So, we should be able to detect if a given tweet is written in Japanese or English with Python. This way, we can only show Japanese tweets coming from Japanese learners to native speakers of the language. Same with English.

yuruyuri16 commented 5 years ago

Nice. @ykdojo

ghost commented 5 years ago

@ykdojo does it mean whenever there is a japanese tweet from a person,the person who is familiar with Japanese will only be able to see that.?or all the members in the community?If we notify only japanese familiar people,then while using this twitter app,they must be registered as learning English knows japanese?Is your thought process is the similar to this?,What I have understood.By the way I am very much interested in contributing to this app idea from which I can gain more knowledge.we can do this to other languages aswell here in India :)

Small doubt :(

ykdojo commented 5 years ago

Hmm here's an example to clarify.

Suppose User A is learning Japanese, and her native language is English.

She starts using one of her Twitter accounts, say, @uesr_a_jp to start tweeting stuff in Japanese.

Then, Japanese native speakers should start seeing these tweets so they can fix them.

However, I'm only concerned that, what if @user_a_jp starts tweeting stuff in both Japanese and English? We should probably be able to ignore all English tweets in that case.

emills11 commented 5 years ago

For something like this, we could look into the langdetect library? If, following along with the above example, @user_a_jp writes a tweet that returns 'en', we would ignore the tweet.

ykdojo commented 5 years ago

Oh yeah, the langdetect library looks good!

emills11 commented 5 years ago

Would you like me to go ahead and create a few functions that make use of the library? @ykdojo

ykdojo commented 5 years ago

Yeah that would be awesome! Thank you.

On Thu, Nov 8, 2018 at 9:37 AM ratdog45 notifications@github.com wrote:

Would you like me to go ahead and create a few functions that make use of the library? @ykdojo https://github.com/ykdojo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ykdojo/editdojo/issues/23#issuecomment-437014861, or mute the thread https://github.com/notifications/unsubscribe-auth/ABukw50Jcr8W-oejIEfxqlCOD4m4-Edrks5utEGPgaJpZM4YM9Wv .

ykdojo commented 5 years ago

NOTE: there's already a PR for this. https://github.com/ykdojo/editdojo/pull/29

Will come back to this when it's more immediately useful.

tushar-punjabi commented 5 years ago

would it be easier to implement google traductors feature of automatic language detection or its something extra and unnecessary ? @ykdojo

ykdojo commented 5 years ago

Yeah, actually I think that will be ideal.