scylladb / python-driver

ScyllaDB Python Driver, originally DataStax Python Driver for Apache Cassandra
https://python-driver.docs.scylladb.com
Apache License 2.0
70 stars 42 forks source link

Introduce Datetime type for ranges outside datetime.[MIN|MAX]YEAR #310

Closed sylwiaszunejko closed 3 months ago

sylwiaszunejko commented 5 months ago

In Python datetime.datetime type year has to be in range [MINYEAR, MAXYEAR]. This range is not the same as possible timestamps in scylla. Previously if timestamp was outside this range it made driver raise an Exception. It was not correct behavior. There was a work around implemented in cqlsh.

This PR introduces a Datetime type to accommodate ranges outside datetime.[MIN|MAX]YEAR. For Datetimes that cannot be represented as a datetime.datetime (because datetime.MINYEAR, datetime.MAXYEAR), this type falls back to printing milliseconds_from_epoch offset.

Something similar was introduced before for datetime.date type - https://github.com/scylladb/python-driver/commit/4f3c77c970e10774243e7ad7baed2d41c06c4790.

Fixes: #255

sylwiaszunejko commented 5 months ago

I'm not sure about a few things:

sylwiaszunejko commented 5 months ago

@avelanarius @Lorak-mmk I made some extra changes to make CI work, also I removed one test test_row_error_message because this test required driver to raise an Exception when timestamp is outside the range [MIN|MAX]YEAR, but after my change this is not a correct behavior anymore.

nyh commented 5 months ago

Maybe you can add the two examples I had in https://github.com/scylladb/python-driver/issues/255#issuecomment-1963632130 and https://github.com/scylladb/python-driver/issues/255#issuecomment-1963652325 as tests? These two cases used to show broken behavior before you patch, and we'll probably want to know it was fixed by the patch.

sylwiaszunejko commented 5 months ago

@nyh I added the test you asked about

avelanarius commented 5 months ago

You should also update the docs page about this change (https://github.com/scylladb/python-driver/blob/master/docs/dates-and-times.rst) and possibly other pages if they mention the old way the driver parsed datetimes.

avelanarius commented 5 months ago

As for some additional testing, I think you should add an additional small test to https://github.com/scylladb/python-driver/blob/27122038b9477eb30ad43f9afdb9f0d79e25182f/tests/integration/standard/test_types.py.

sylwiaszunejko commented 5 months ago

As for some additional testing, I think you should add an additional small test to https://github.com/scylladb/python-driver/blob/27122038b9477eb30ad43f9afdb9f0d79e25182f/tests/integration/standard/test_types.py.

What kind of test do you mean? .util.Datetime is tested every time get_sample is called on timestamp and it happens multiple times in test_types.py for example in test_can_insert_primitive_datatypes

fruch commented 5 months ago

Two things,

this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken.

We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

Lorak-mmk commented 5 months ago

Two things,

this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken.

We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

This is absolutely a breaking change, I wonder if we shouldn't bump our major version number because of it. The downside is that syncing with upstream would become even more problematic...

sylwiaszunejko commented 5 months ago

Two things, this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken. We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

This is absolutely a breaking change, I wonder if we shouldn't bump our major version number because of it. The downside is that syncing with upstream would become even more problematic...

@Lorak-mmk What do you suggest? Because this issue was discussed in upstream years ago, but they decided to do workaround in cqlsh instead of fixing it https://issues.apache.org/jira/browse/CASSANDRA-10625. @nyh was against this solution and said we should fix it, maybe we should have a discussion about that? I see there are a lot of things to consider. @avelanarius @fruch

roydahan commented 5 months ago

@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.

When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field. If that’s the case, there shouldn’t be an urgency to merge it for now.

fruch commented 5 months ago

@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.

When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

If that’s the case, there shouldn’t be an urgency to merge it for now.

nyh commented 5 months ago

@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start. When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

This is not surprising, given that the problem is about dates millions of years into the past or future. Maybe not a lot of paleontologists or science-fiction writers use Scylla :-)

Lorak-mmk commented 5 months ago

If the problem is not encountered by users and is generally not severe then I don't think it's worth it to break compatibility to fix it - and I suspect upstream will say the same.

fruch commented 5 months ago

@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start. When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

This is not surprising, given that the problem is about dates millions of years into the past or future. Maybe not a lot of paleontologists or science-fiction writers use Scylla :-)

You are correct it doesn't fully adhere to the CQL spec in that

I think those really need that kind of dates, can override instruct the driver not to convert, or do the same hack done in cqlsh.

Still we can suggest it upstream, and maybe they would be convinced

sylwiaszunejko commented 5 months ago

@avelanarius @fruch @nyh @Lorak-mmk What should be my next steps? How should I contact upstream? Maybe you have any idea how to fix the issue without breaking compatibility?

fruch commented 5 months ago

@avelanarius @fruch @nyh @Lorak-mmk What should be my next steps? How should I contact upstream

Raise an issue, and open a PR similar to this one.

it might take some time until they would react to those, but that how we were doing those so far.

I guess you can find some cassandra mailing to discuss as well

Lorak-mmk commented 3 months ago

Can this PR be closed now that upstream PR is open?

roydahan commented 3 months ago

We shall assume that the upstream PR won't get noticed for quite a while, so the question is what to do with the issue this PR claims to fix: https://github.com/scylladb/python-driver/issues/255

It's ok to provide just an answer there or reference to relevant docs and close the issue if we believe this is what's need to be done.

sylwiaszunejko commented 3 months ago

We are not merging it to not break backwards compatibility, but if someone needs datetimes outside datetime.[MIN|MAX]YEAR feel free to cherry-pick.