Open Luttik opened 3 years ago
WHoops dont mind the draft
@Luttik Thanks for the report, I was able to reproduce the behavior.
Passing in None
is interpreted as not specifying expiration at all, but the code should distinguish an explicit None
from omitting the expiration period altogether (in which case the default is used). I'll check why this is the case.
Update: Partition expiration time is sent through the expirationMs
field in the JSON request body, but the backend seems to treat explicit null
values the same as if the field is omitted altogether. In both cases the new table inherits expiration time from the dataset, unfortunately.
Now, the client library should still make the distinction between None
and "omitted", but the backend will likely need an update as well.
@Luttik As a workaround, you could set the expiration time to an insanely high value. I tried with sys.maxsize
and the backend accepted it, meaning that the table will expire in ~292 million years. I suppose that's good enough for practical purposes? :laughing:
>>> sys.maxsize / 1000 / 86400 / 365 / 1e6
292.471208677536
@shollyman How difficult would it be to change this behavior? And how likely would that break some of the user's apps?
Hmm... I might have typed too soon. Setting the partition expiration date actually seems to work as expected (even if None
), but it's the table expiration date that is inherited from the dataset on creation, even if table.expires
is explicitly set to None
by the client.
Setting table.expires
to datetime.max
should be a decent workaround for now, though.
table = Table(
f"project.dataset.test_table",
schema=[...],
)
table.expires = datetime.datetime.max # <--- workaround!
table.time_partitioning = TimePartitioning(
TimePartitioningType.DAY, field="date", expiration_ms=None,
)
There is a backend bug 181299401 for the ability to explicitly unset table expiration at creation time. Setting to max datetime does look like a decent workaround.
Related public issue: https://issuetracker.google.com/180993856
@tswast The related internal issue b/181299401 was closed as a "Won't Fix" with an identified workaround. Will that workaround work here or should we reopen the internal issue?
@meredithslota I was rather disappointed with the workaround, which was to call update_table
after calling create_table
. I think it's worth opening a new internal issue specifically for this feature.
Likely this will require coordination between backend and client teams, as it sounds like the backend does not have the ability to detect an explicit "null" on any API except "patch" (i.e. update_table
)
I re-opened the same internal issue but converted it to a feature request.
Created b/230739872 to track the backend work necessary to enable this new feature.
Environment details
google-cloud-bigquery
version: 2.13.1Steps to reproduce / Code example
When running the following code a table is where the value of Partition Expiration is silently set to the default expiration date defined on the dataset.
To start it is good to note that the way to remove an expiration date (fully remove, not set to default) from a partition via the standard SQL is as follows:
So even though you explicitly seem to set the
expiration_ms
to "no expiration date" in the code it will still silently remove the table after time x (depending on the dataset). The fieldexpiration_ms
only acceptsNone
and positive integers, therefore, when the dataset does have one by default, there is no way to create the table with a TimePartition but without an expiration_date at all. Values like0
,null
, andNULL
will all throw an error.I would advise changing the behavior as follows:
Set the default_values of some variable that is given the name
DEFAULT_PARTITION_EXPIRATION_FROM_DATASET
such that this behavior is clear from the function definition. Don't make the value of the variableNone
such that we still have this variable available to explicitly turn the expiration functionality off.