red-data-tools / red-datasets

A RubyGem that provides common datasets
MIT License
30 stars 25 forks source link

Add support for bill of the House of Representatives of Japan #184

Closed tikkss closed 6 months ago

tikkss commented 6 months ago

Fixes #142.

Notes for reviewers

Wareki can't parse to Date object (e.g., "平成10年 3月 4日"). I think the following options.

  1. Introduce wareki gem
  2. Parse independently method (Not implemented)
  3. Keep it as a String (Not parsed)
kou commented 6 months ago

The wareki gem will be convenient but it's overkill for us. Because it changes Date.

Parse independently method (Not implemented)

Can we try this? Do we need to support only 昭和, 平成 and 令和?

kou commented 6 months ago

Parse independently method (Not implemented)

Can we try this? Do we need to support only 昭和, 平成 and 令和?

We can defer this to a follow-up task.

tikkss commented 6 months ago

Thanks for your review!

The wareki gem will be convenient but it's overkill for us. Because it changes Date.

I agree.

I would like to implement independently method only support 昭和, 平成, 令和.

tikkss commented 6 months ago

I've rebased on master and update expected, so the CI has successfully.

I implement independently method only support 平成, 令和 (not 昭和, 平成, 令和). Because the fields capable of accepting dates only contained Heisei and Reiwa. This fact was confirmed with the following script on 24e251d.

require "datasets"

house_of_representative = Datasets::HouseOfRepresentative.new
japanese_date_strings = house_of_representative.collect { |record|
  [
    record.house_of_representatives_of_accepted_bill_on_preliminary_consideration,
    record.house_of_representatives_of_preliminary_refer_on,
    record.house_of_representatives_of_accepted_bill_on,
    record.house_of_representatives_of_refer_on,
    record.house_of_representatives_of_finished_consideration_on,
    record.house_of_representatives_of_finished_deliberation_on,
    record.house_of_councillors_of_accepted_bill_on_preliminary_consideration,
    record.house_of_councillors_of_preliminary_refer_on,
    record.house_of_councillors_of_accepted_bill_on,
    record.house_of_councillors_of_refer_on,
    record.house_of_councillors_of_finished_consideration_on,
    record.house_of_councillors_of_finished_deliberation_on,
    record.promulgated_on,
  ].reject { |s| s.nil? || s == "" }
}.flatten
japanese_date_strings.collect { |s| s[0..1] }
                     .uniq
# => ["平成", "令和"]

The private method parse_japanese_date was asserted correctly with the following script.

require "date"
require "test-unit"

JAPANESE_DATE_MATCHER = /\A(平成|令和)\s*(\d{1,2}|元)年\s*\d{1,2}月\s*\d{1,2}日\z/.freeze

def parse_japanese_date(field)
  return field unless field.to_s.match?(JAPANESE_DATE_MATCHER)
  Date.parse(normalize_jisx0301_format(field))
end

def normalize_jisx0301_format(japanese_date_string)
  japanese_date_string.gsub(/平成/, "H")
                      .gsub(/令和/, "R")
                      .gsub(/元/, "01")
                      .gsub(/\s/, "0")
                      .gsub(/[年月]/, ".")
                      .gsub(/[日]/, "")
end

class ParseJapaneseDateTest < Test::Unit::TestCase
  data("month and day with leading a space in Heisei", ["H10.01.01", "平成10年 1月 1日"])
  data("month         with leading a space in Heisei", ["H10.01.10", "平成10年 1月10日"])
  data("          day with leading a space in Heisei", ["H10.10.01", "平成10年10月 1日"])
  data("           without leading a space in Heisei", ["H10.10.10", "平成10年10月10日"])
  data("year, month and day with leading a space in Reiwa", ["R02.01.01", "令和 2年 1月 1日"])
  data("year, month         with leading a space in Reiwa", ["R02.01.10", "令和 2年 1月10日"])
  data("year,           day with leading a space in Reiwa", ["R02.10.01", "令和 2年10月 1日"])
  data("year,            without leading a space in Reiwa", ["R02.10.10", "令和 2年10月10日"])
  data("boundary within Heisei", ["H31.04.30", "平成31年 4月30日"])
  data("boundary within Reiwa", ["R01.05.01", "令和元年 5月 1日"])
  def test_parse_japanse_date(data)
    expected_jisx0301, japanese_date_string = data
    assert_equal(expected_jisx0301, parse_japanese_date(japanese_date_string).jisx0301)
  end
end

Test::Unit::AutoRunner.run
# Loaded suite irb
# Started
# Finished in 0.002313 seconds.
# --------------------------------------------------------------------------------------------
# 10 tests, 10 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
# 100% passed
# --------------------------------------------------------------------------------------------
# 4323.39 tests/s, 4323.39 assertions/s
# => true

Could you review it again?

kou commented 6 months ago

How about the Japanese date parse method as a Dataset::JapaneseDateParser like other utility class such as ZipExtractor and Downloader? Then we can put the tests to test/test-japanese-date-parser.rb.

tikkss commented 6 months ago

How about the Japanese date parse method as a Dataset::JapaneseDateParser like other utility class such as ZipExtractor and Downloader? Then we can put the tests to test/test-japanese-date-parser.rb.

I see. It is testable! I'll try to extract Datasets::JapaneseDateParser.

tikkss commented 6 months ago

Thanks for your review and suggest some clear code. I dealt with all of them you pointed so far.

tikkss commented 6 months ago

Thank you so much. I'm excited to encounter some clear code I had not even thought of.

I dealt with all of them you pointed out so far. Could you review it again?

tikkss commented 6 months ago

Thanks for your review! I've improved all.