Closed saschahofmann closed 2 years ago
I now cache the schema only and construct the table expr from cached schemas. I'd still be curious whether there is a way to achieve the above?
Hey @saschahofmann -- I need to set up a bigquery account so I can test this myself, but does deleting the source
attribute get the same thing done?
del tbl.op().source
pickle.dumps(tbl)
If not, I think this can be worked around by defining a custom Pickle class and using the reducer_override
to skip over the source
attribute
Hm seems like pickle.dumps is actually working. The error with pickling happens when I try to cache the table in redis. I am trying to find out what call exactly is causing it.
I am also struggling to get the original error locally but the error for sure was that the source wasnt serialiazable
How would I recover the BiqQueryTable with a connection since I can't set the source on it?
right now I am creating it like this with the schema coming from cache
tbl = TableExpr(
BigQueryTable(
name=f"{settings.GCP_PROJECT}.{table.bq_dataset}.{table.bq_table}",
schema=schema,
source=conn,
)
)
I would also assume that it happens for any backend not only for BigQuery
There are a couple options to consider:
object.__setattr__(op, "source", new_source)
. This is NOT recommended, but extremely expedient. source
is not included in the __hash__
computation for reasons similar to those for why source
cannot be pickled, FYI.You can cache UnboundTable
s and always run execute
by calling it as a method on the Backend
instance as opposed to on the Expr
instance. This may or may not be viable for you. If you can do this, I would recommend it. Example:
con = ibis.bigquery.connect(...)
t = ibis.table(dict(a="int64"))
con.execute(t)
Thx @cpcloud ! 2. is not really an option and I think I will stick to caching the schema for now since this is the main reason we cache the table anyway. I can then recreate the table as mentioned above!
Just out of curiousity: why is the table object immutable? Maybe my use case is too niche but maybe it'd be nice to have an easy way to create a new object from an existing one with different kwargs?
Just out of curiousity: why is the table object immutable?
The main reason is to allow operations to be hashable. We use dictionaries whose keys are ops.Node
instances in many places. One important way in which we use them is to avoid unnecessary (re)computation.
yeah ok gotcha. Closing
Thanks @saschahofmann, really appreciate your feedback!
We are caching table expressions in redis but
table.op().source
(the table connection) isnt pickleable. Before 3.0.2 we were able to set that toNone
and vice versa when fetching from cache.
Now the table (inheriting from
Annotable
) is immutable and I can't set that attributeTypeError: Attribute 'source' cannot be assigned to immutable instance of type <class 'ibis_bigquery.client.BigQueryTable'>
Can I maybe duplicate the instance and only change that one prop?