red-data-tools / red-datasets

A RubyGem that provides common datasets
MIT License
30 stars 25 forks source link

Add support for Japanese-English Bilingual Corpus of Wikipedia's Kyoto Articles #135

Closed tmatsuura1 closed 2 years ago

tmatsuura1 commented 2 years ago

https://github.com/red-data-tools/red-datasets/issues/57 @kou It is now working, but I am not sure if this is really the right way to implement it. Will you please give me any comments or advice? At the moment, I am assuming that user will use it as in the example.

kou commented 2 years ago

I've pushed a commit that improves API.

TODO:

tmatsuura1 commented 2 years ago

@kou Thank you so much. I want to try TODOs next weekend. I think It's difficult to decide which is better, but I think the implementation will be easier to understand with idea b and b is better. And my feeling is that the structure of the data is different, so it seems to me that it would be better to separate the classes as well.

kou commented 2 years ago

I want to try TODOs next weekend.

Great!

I reconsidered API. How about the followings?

tmatsuura1 commented 2 years ago

Thank you. I think this API is good.

tmatsuura1 commented 2 years ago

@kou Could you review this commits again?

kou commented 2 years ago

We can group tests by sub_test_case like https://github.com/red-data-tools/red-datasets/blob/master/test/test-aozora-bunko.rb#L71 . Can I organize tests?

tmatsuura1 commented 2 years ago

Please organize tests.

kou commented 2 years ago

Done. Merged.

Thanks!