hunkim / translation_coverage

Automatically check the rates between alpha VS other (unicode)
6 stars 2 forks source link

Error on example (should be 0, but not) #22

Closed hunkim closed 8 years ago

hunkim commented 8 years ago

@mingrammer Could you check our examples?

https://github.com/tensorflowkorea/tensorflow-kr/blob/master/progress.md

For example, word2vec should be 0, but it's not the case.

mingrammer commented 8 years ago

There are some source code. I think these are conuted as non-english characters.

mingrammer commented 8 years ago

Do you want if some files does not have any korean (or other translated langauge) should have 0?

hunkim commented 8 years ago

I think we should show the progress of translation. It should be the rate between English and Non-English words. I think we should exclude source code and other taggings when we compute the rate.

hunkim commented 8 years ago

"should have 0?" Yes, so that people clearly know what they should work on. :-)

hunkim commented 8 years ago
    def test_trans_coverage_file_source_code(self):
        e_count, n_count = main.trans_coverage_file("tests/sample_source_code.md")
        print("sample_source_code: ", e_count, n_count)
        self.assertEqual(e_count, 0)
        self.assertNotEqual(n_count, 0)

I guess self.assertNotEqual(n_count, 0) should be self.assertEqual(n_count, 0).

mingrammer commented 8 years ago

Yes. hmm.. so the source code (or others not normal text) should be ignored completely (does not counted at all)?

mingrammer commented 8 years ago

If there is complete source code, we may mark this as 0(%). right?

hunkim commented 8 years ago

Sure.

Sung

On Mon, Oct 17, 2016 at 3:09 PM, ming notifications@github.com wrote:

If there is complete source code, we may mark this as 0(%). right?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hunkim/translation_coverage/issues/22#issuecomment-254129162, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3DV8TKjqC1LoRCIWZBq8o8-5qeQa49ks5q0x8_gaJpZM4KYSDb .

mingrammer commented 8 years ago

So, how about this?

  1. If some files are complete source code, we mark this as like 'This is source code' and make it's percentage to 100(%)
  2. If some files have source code partially, we should mark it as 0(%) if there are no any translation. But, if translation for that files are almost completed, we also could count the source code at this time. (In this case, we should choose appropriate value)

Is it complex? or any ideas?

hunkim commented 8 years ago

Let's keep it simple. Just exclude source code in the equation.

mingrammer commented 8 years ago

Ok, I'll just exclude source code not counting length of that.