Open minotru opened 3 weeks ago
This code is currently in the process of a rework by @dicentra13. get_tensorstore_spec
in serialization.py
should be either removed or should reuse build_kvstore_tspec
. I think it is only used in a couple places in internal code - just due slower progress in refactoring rather than any inherent need.
Again, async_serialize
is used in one place internally that I'm working on eliminating, although probably will leave that function in place as a wrapper.
Agreed that ts_context should use one source of truth, that could use fixing.
Got it @cpgaffney1 , thank for the reply! Waiting for the refactoring by @dicentra13 :)
Let's close this issue once refactoring is completed and merged
Hi Orbax team,
I was looking at Orbax code at the latest version==0.7.0 and found that pieces of code with quite heavy logic around tensorstore_spec creation seem to contain duplicates.
I'd like to know if this code duplication intended by design or I am welcome to submit a PR.
Here
get_tensorstore_spec
is a part of public API, and I can't find any usage ofget_tensorstore_spec
byorbax-checkpoint
itself https://github.com/google/orbax/blob/8b4e90d573082a5c7caa5f99c51db376f62a6995/checkpoint/orbax/checkpoint/serialization.py#L97C5-L124And here is a very similar piece of code in
build_kvstore_tspec
in_internal
package, andbuild_kvstore_tspec
is used heavily bytype_handlers.py
https://github.com/google/orbax/blob/8b4e90d573082a5c7caa5f99c51db376f62a6995/checkpoint/orbax/checkpoint/_src/serialization/tensorstore_utils.py#L62-L135Would you consider
get_tensorstore_spec
to reusebuild_kvstore_tspec
under the hood?Also, there seems to be a bit of obscurity with default ts_context value.
orbax/checkpoint/serialization.py
, there is TS_CONTEXT in publicserialization.py
that is used as a default value ofcontext
in async_serialize (actually,orbax
does not useasync_serialize
anywhere and recommends usingasync_serialize_shards
) , async_serialize_shards, async_deserialize and by StringHandler.type_handlers.py
, there is get_ts_context() (it references_DEFAULT_OCDBT_TS_CONTEXT
), andget_ts_context
is used by all other handler implementations.So,
TS_CONTEXT
fromserialization.py
seems to be never used by common checkpoint IO code.Should we somehow leave only 1 source of truth for default ts_context values?