sharebook-kr / pykrx

KRX 주식 정보 스크래핑
695 stars 240 forks source link

ValueError: invalid literal for int() with base 10: '40,002' #1

Closed sketchout closed 5 years ago

sketchout commented 5 years ago

Traceback (most recent call last): File "a.py", line 12, in print(krx.get_shorting_status_by_date("20181210", "20181212", "005930"))

File "D:\0.prj_python\pykrx-master\pykrx\krx.py", line 53, in get_shorting_status_by_date return SRT02010100.scraping(isin, fromdate, todate)

File "D:\0.prj_python\pykrx-master\pykrx\shorting.py", line 48, in scraping df = df.replace({',': ''}, regex=True).astype(np.int64)

File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\generic.py", line 5681, in astype kwargs) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\internals\managers.py", line 531, in astype return self.apply('astype', dtype=dtype, kwargs) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\internals\managers.py", line 395, in apply applied = getattr(b, f)(kwargs) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\internals\blocks.py", line 534, in astype kwargs) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\internals\blocks.py", line 633, in _astype values = astype_nansafe(values.ravel(), dtype, copy=True) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\dtypes\cast.py", line 685, in astype_nansafe return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape) File "pandas_libs\lib.pyx", line 530, in pandas._libs.lib.astype_intsafe

ValueError: invalid literal for int() with base 10: '40,002'

sketchout commented 5 years ago

Traceback (most recent call last): File "a.py", line 12, in print(krx.get_shorting_status_by_date("20181210", "20181212", "005930"))

File "D:\0.prj_python\pykrx-master\pykrx\krx.py", line 53, in get_shorting_status_by_date return SRT02010100.scraping(isin, fromdate, todate)

File "D:\0.prj_python\pykrx-master\pykrx\shorting.py", line 48, in scraping df = df.replace({',': ''}, regex=True).astype(np.int64)

File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\generic.py", line 5681, in astype kwargs) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\internals\managers.py", line 531, in astype return self.apply('astype', dtype=dtype, kwargs) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\internals\managers.py", line 395, in apply applied = getattr(b, f)(kwargs) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\internals\blocks.py", line 534, in astype kwargs) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\internals\blocks.py", line 633, in _astype values = astype_nansafe(values.ravel(), dtype, copy=True) File "D:\python372\lib\site-packages\pandas-0.24.0rc1-py3.7-win32.egg\pandas\core\dtypes\cast.py", line 685, in astype_nansafe return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape) File "pandas_libs\lib.pyx", line 530, in pandas._libs.lib.astype_intsafe

ValueError: invalid literal for int() with base 10: '40,002'

Modify shorting.py with below code df = df.replace(',','', regex=True).astype(np.int64)

df = df.replace({',': ''}, regex=True).astype(np.int64)

sharebook-kr commented 5 years ago

@sketchout Sorry for the slow reply. Could you find out which version of Pandas you're using?

import pandas
print(pandas.__version__)

astype(np.int64) is required for arithmetic operations of API results.

>> df = krx.get_shorting_status_by_date("20181210", "20181212", "005930")
>> sum(df['공매도'])
2922537
mr-yoo commented 5 years ago

The latest version (0.24.1) does not replace comma separator, but 0.23.4 does.

df = pd.DataFrame({'one': ["1,000", "2,000"], 'two': ["3,000", "4,000"]})
df = df.replace({',': ''}, regex=True)
print(df)

In 0.23.4, it works!

    one   two
0  1000  3000
1  2000  4000

In 0.24.1, it does NOT work!

     one    two
0  1,000  3,000
1  2,000  4,000

To avoid version problems, I used "value" parameter instead of "to_relace" parameter with dictionary.

df = df.replace(',', '', regex=True)