Open michaeleisel opened 7 months ago
Interestingly, polars seems to handle timestamps that aren't losslessly convertible into a timestamp with microseconds. Here we have a dataframe that I made in parquet with a timestamp value of 1 nanosecond:
>>> pl.read_parquet('a.parquet')
shape: (1, 1)
┌───────────────────────────────┐
│ datetime_ns │
│ --- │
│ datetime[ns] │
╞═══════════════════════════════╡
│ 1970-01-01 00:00:00.000000001 │
└───────────────────────────────┘
So, I wonder if there's an inaccuracy in https://docs.pola.rs/user-guide/concepts/data-types/overview/ when it describes Datetime as "internally represented as microseconds"
Description
Polars has great support for lots of different formats, and it seems like it has picked some reasonable ways of turning those formats into dataframes. It would be great to document these choices, and the principles behind them, in some sort of way. One principle I've heard from others is that polars always losslessly converts various data types into their internal formats. This is a great principle that can answer many questions, but still leaves some areas of murkiness that would be good to document. I think it would also be good for the sake of explicitness to document even trivial conversions, just so the user is clear (e.g., a JSON string being turned into a polars string). But here are some examples of questions that maybe have less obvious answers:
{"column1": [1, 2], "column2": ["a", "b"]}
? What about the form[{"column1": 1, "column2": "a"}, {"column1": 2, "column2": "b"}]
?undefined
in JSON? Does it treat those values the same way as it treatsnull
?isAdjustedToUTC
is true vs. false?What I would love to see, personally, is a table listing out each data type of each supported format and how it gets mapped to a polars data type. This is not at all a criticism of how polars' converts from input data types to polars data types, I just think it would be great to add more docs explaining it to newcomers like myself.
Link
No response