Open alphaho opened 4 years ago
Great suggestion which is on the roadmap already.
The most tricky question would be which type to use internally in a future DateCol
. To keep it simple I'd think that a single format should be supported only (if possible). https://stackoverflow.com/questions/32437550/whats-the-difference-between-instant-and-localdatetime gives a great overview, but I'm still not sure which one would be most generic to support all usecases.
To me Instant
seems most versatile and can be mapped to timezone as detailed out in https://mkyong.com/java8/java-convert-instant-to-localdatetime/
Concerning your usecases: Most of them make total sense to me. However, I struggle with multiplication (last one in your list).
The difference of two DateCol
s could be by convention either a typed representation (such as Period/Duration) or simple an int/long (millisecond difference). Not sure which one is more intuitive.
What do you think?
Great to know it's already on the roadmap!
I agree that Instant
would be the best choice to use internally in a DataCol
for representing time in general.
But we may also need a few more typed DataCol
s to support Duration
, LocalTime
, etc. So that we may provide a better out-of-the-box experience.
After using krangl
for over a month, I've found that I often need to do quite some type conversion myself before manipulating the data. So for the last usecase(the multiplication), I think it would be way better if we can favor typed representation and require less work from the user to get the job done.
But I do agree that it may not be very scalable as we may need to support so many types and so many different operations on each time.
I guess with more types being added, the API potentially could use an overhaul to rather support some more generic column type provider that implements all basic operations. This would e.g. allow users to register own types for improved convenience. However, I'm not yet so sure about how to implement such a feature.
Regarding the type conversions: I agree it's not so straight forward as I'm used to from R for example. On one hand, it should be somehow typed to provide sensible completion but on the other, too much typing requires casting in many situations. Feel welcome if you have ideas about how to solve this more elegantly.
I've been using Pandas for my data manipulation for quite some time and would very like to switch to Kotlin with krangl as it has much better type system support.
When I tried to port one of my pandas script to krangl, I've found it lacks support on
ZonedDateTime
,LocalDateTime
andDuration
as aDataCol
. And I need to use a lot of mapping and casting to work around it. Which is not straightforward enough.For example, pandas has support for:
pd.to_datetime(df["start_time_in_str"])
datetime
from another to get a Series oftimedelta
. e.gdf["end_time"] - df["start_time"]
timedelta
to a Series ofdatetime
to get another Series ofdatetime
. e.gdf["start_time"] + df["duration"]
long
/double
to a Series of differenttimedelta
by multiplying by atimedelta
constant. e.g.df["some_doubles"] * datetime.timedelta(hours = 1)
It would be much better if we can have such capabilities included in the library.