qri-io / starlib

qri's standard library for starlark
MIT License
113 stars 29 forks source link

feature request: detect if a string column is actually numeric #157

Open dustmop opened 2 years ago

dustmop commented 2 years ago

If a column of numerical data is all string values, for example parsed out of a webpage using a scraper, then DataFrame will treat the column as a string type. Instead, it could automatically convert them to numbers. This causes feature drift from the python pandas version, but may be worth doing anyway.

In addition, if a string contains commas, we could suggest that they be automatically stripped before converting to a string. This feature could also take the locale into account, and strip periods instead for those locales that use them for decimal place separation. See https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html for a list of different representations.