pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.42k stars 1.97k forks source link

Julian #12942

Open dridk opened 11 months ago

dridk commented 11 months ago

Description

Could you add Julian parser like pandas ? https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.to_julian_date.html

MarcoGorelli commented 11 months ago

thanks for the request - I'm tempted to say this is too specialised and would be better suited to a polars plugin, but curious what others think

mkleinbort commented 11 months ago

Is this equivalent to a fixed integer offset? If so, yes, I'd say this is a little beyond what should be available as core functionality.

alexander-beedie commented 11 months ago

I think this could actually be quite useful for some domains where it is a typical representation (and also perhaps areas like sqlite/excel interop, which both use Julians natively). It's also easy/lightweight to implement; in fact I'm 80% done with it after half a coffee πŸ˜…β˜•

Update: I take it back; it's trivial to implement naΓ―vely, but if you dig in deeper there are issues relating to floating point deviation that can bite you in the πŸ‘ if you're not careful; at one point I found myself looking through source code used at NASA and figured I'd sit on this one for a bit 🀣

MarcoGorelli commented 10 months ago

I've added this to polars-xdt

In [2]: from datetime import datetime
   ...: import polars_xdt  # noqa: F401
   ...: df = pl.DataFrame(
   ...:     {
   ...:         "date_col": [
   ...:             datetime(2013, 1, 1, 0, 30),
   ...:             datetime(2024, 1, 7, 13, 18, 51),
   ...:         ],
   ...:     }
   ...: )
   ...: with pl.Config(float_precision=10) as cfg:
   ...:     print(df.with_columns(
   ...:         julian_date=pl.col("date_col").xdt.to_julian_date()
   ...:     ))
   ...:
shape: (2, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ date_col            ┆ julian_date        β”‚
β”‚ ---                 ┆ ---                β”‚
β”‚ datetime[ΞΌs]        ┆ f64                β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ════════════════════║
β”‚ 2013-01-01 00:30:00 ┆ 2456293.5208333335 β”‚
β”‚ 2024-01-07 13:18:51 ┆ 2460317.0547569445 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

OK to close this and for people to use that, or does it need to be in the core library?

alexander-beedie commented 10 months ago

OK to close this and for people to use that, or does it need to be in the core library?

@MarcoGorelli: Probably worth having (I was eyeing it for polars-sql functionality just recently, and with SQLite/Excel both using Julians there might be some other benefits). Will have to take a look at what algorithm you used; the industrial-strength version has various tricks to guard against floating point rounding deviations creeping in :))

MarcoGorelli commented 10 months ago

sounds good, thanks! here it is:

https://github.com/pola-rs/polars-xdt/blob/6023324a58e9d7112f14205f86b0e7d958c8f2db/src/to_julian.rs#L9-L31