ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.15k stars 590 forks source link

feat(memtable): handle castable inputs from date-like strings to date #9245

Open NickCrews opened 4 months ago

NickCrews commented 4 months ago

What happened?

import ibis

# errors
ibis.memtable({"d": ["1970-01-01"]}, schema={"d": "date"}).execute()
# works
# ibis.memtable({"d": ["1970-01-01"]}).cast({"d": "date"}).execute()

What version of ibis are you using?

main

What backend(s) are you using, if any?

duckdb

Relevant log output

No response

Code of Conduct

gforsyth commented 4 months ago

Ok, so pyarrow won't handle casts from stringy dates to date-types, but DuckDB will.

there's pyarrow.compute.strptime, but we'd have to think about how many date format strings to try before bailing out

NickCrews commented 4 months ago

I'd be fine with literally just "yyyy-mm-dd" format. I am only running into this in testing, when constructing literals. If someone else has real data they are working with, I think it wouldn't be too much to expect them to either format it that way, or convert it to a numpy datetime or a python datetime or something less ambiguous.

Related to this, but I think worth thinking about, is ideally I want to be able to create all ibis datatypes without needing to import anything. eg using the same logic as used in ibis.date(str), etc:

import ibis

ibis.memtable(
    {
        "uuid": ["9ff13914-a718-48b3-a746-5114cff95d56"],
        # "uuid": [1234],  # would be awesome to specify as an int too
        # "date": ["1970-01-01"],  # currently errors
        # "time": ["12:34:56"],  # currently errors
        # "timestamp": ["2023-01-02T03:04:05"],  # currently errors
    },
    schema={
        "uuid": "uuid",
        # "date": "date",
        # "time": "time",
        # "timestamp": "timestamp",
    },
).cache().execute()

This would make testing much easier.

cpcloud commented 4 months ago

Removing the bug label. This isn't broken functionality, it's a feature request for specific new behavior.