josw123 / dart-fss

한국 금융감독원에서 운영하는 다트(Dart) 시스템 크롤링을 위한 라이브러리
https://github.com/josw123/dart-fss
MIT License
321 stars 110 forks source link

본문 내부의 날짜가 아닌 다른 문자열이 datetime으로 검색되는 문제 #83

Closed josw123 closed 3 years ago

josw123 commented 3 years ago

Datetime 검색을 위한 regular expression에서 아래의 문자열이 datetime 문자열로 검색되는 문제가 발생함

ex) 2010년: 법인세율(주민세포함 24.2%)*(1-유효감면률 13.51%)) = 20.93%

Traceback (most recent call last): File "execute.py", line 91, in execute_one fs = corp.extract_fs(bgn_de='20100101', separate=True, report_tp=['quarter']) File "/usr/local/lib/python3.7/site-packages/dart_fss/corp/corp.py", line 233, in extract_fs return extract(self.corp_code, bgn_de, end_de, fs_tp, separate, report_tp, lang, separator, dataset) File "/usr/local/lib/python3.7/site-packages/dart_fss/fs/extract.py", line 1340, in extract raise e File "/usr/local/lib/python3.7/site-packages/dart_fss/fs/extract.py", line 1306, in extract dataset=dataset) File "/usr/local/lib/python3.7/site-packages/dart_fss/fs/extract.py", line 1175, in analyze_report fs_df = analyze_html(report, fs_tp=fs_tp, separate=separate, lang=lang) File "/usr/local/lib/python3.7/site-packages/dart_fss/fs/extract.py", line 583, in analyze_html extract_results = extract_fs_table(fs_table=fs_table, fs_tp=fs_tp, separate=separate, lang=lang) File "/usr/local/lib/python3.7/site-packages/dart_fss/fs/extract.py", line 486, in extract_fs_table columns = convert_thead_into_columns(fs_tp=tp, fs_table=table, separate=separate, lang=lang) File "/usr/local/lib/python3.7/site-packages/dart_fss/fs/extract.py", line 160, in convert_thead_into_columns date_info = extract_date_from_header(fs_table['header']) File "/usr/local/lib/python3.7/site-packages/dart_fss/fs/extract.py", line 107, in extract_date_from_header date.append(datetime(year, month, day)) ValueError: ('month must be in 1..12', "An error occurred while fetching or analyzing {'rcp_no': '20101112000144', 'corp_code': '00568188', 'corp_name': '마이크로컨텍솔', 'stock_code': '098120', 'corp_cls': 'K', 'report_nm': '분기보고서 (2010.09)', 'flr_nm': '마이크로컨텍솔', 'rcept_dt': '20101112', 'rm': ''}.")