Summary: This PR hopefully 🤞 solves >90% of our remaining Ref/Naming issues - meaning users should be able to freely create entities/projects (as long as Gorilla accepts them) and name their objects/ops anything they want (baring a % symbol). Under the hood, I close a few remaining constraints / parsing issues on the server side and mostly modify the UI and Python clients to correctly read/write quoted refs.
High Level Improvements:
Python Client will always produce valid refs (ie. it correctly quotes when needed)
Benefit to user: no more crashed python scripts when bad chars are in the name
Note: the one case of % in the name is still illegal. We sanitize on the python client for this specific case.
Python Client is able to consume quoted refs (ie. it correctly unquotes when needed)
Benefit to user: able to correctly walk ref data edges with bad chars in the name
UI Client will always produce valid refs (ie. it correctly encodes when needed)
Benefit to user: able to link safely when dealing with bad characters
UI Client is able to consume encoded refs (ie. it correctly decodes when needed)
Benefit to user: able to correctly display links with bad characters
Correctly de-quoting encoded refs on project lookup
This PR finishes up the Ref work to ensure we support any ref the user throws at us.
Context:
In a recent PR: https://github.com/wandb/weave/pull/2077, I implemented a more robust parser and checker to ensure Refs and Names conform to expectations. This stopped the "bleeding" where we were storing invalid Refs.
However, our two clients Python and UI had not been updated to reflect the new rules. In fact, the Python client was still creating bad refs, and the UI was overly restrictive with what it thought a ref could be. So double edge sword. As a result, users still hit crashes (either python write time or ui read time).
Moreover, our Ref parsing logic is a mess (especially in the UI) as it is spread across many locations & riddled with legacy junk.
A little terminology note: in PY, URL escaping is called quoting and in TS it is encoding. When used in this PR, they follow the language-specific convention, but logically mean the same concept.
The important part to understand is the ref format:
Calls:
weave:///<entity>/<project>/call/<callid>[extra]
Ops:
weave:///<entity>/<project>/op/<name>:<digest>[extra]
Objects:
weave:///<entity>/<project>/object/<name>:<digest>[extra]
Tables:
weave:///<entity>/<project>/table/<digest>[extra]
Where:
[extra] is an optional set of slash-seperated tuples of edges. for example:
/key/key_name_1/index/0
They always follow the pattern of [edgeName/edgeValue], where
`edgeName` is a shortlist of 4 specific edges names.
Furthermore, entity, project, name and the edgeValues are user-controlled.
The main constraints we need to parse these refs are:
entity cannot contain slashes /
project cannot contain slashes /
name cannot contain slashes / or colons :
edgeValue cannot contain slashes /
Therefore, we need to basically URL encode/decode each of these parts in all the places that they are used
This PR basically updates the parsing and construction logic to properly encode/decode entity, project, nameandedgeValue`.
This adds server-side decoding of entity/project per this more rigorous spec
Summary: This PR hopefully 🤞 solves >90% of our remaining Ref/Naming issues - meaning users should be able to freely create entities/projects (as long as Gorilla accepts them) and name their objects/ops anything they want (baring a
%
symbol). Under the hood, I close a few remaining constraints / parsing issues on the server side and mostly modify the UI and Python clients to correctly read/write quoted refs.High Level Improvements:
%
in the name is still illegal. We sanitize on the python client for this specific case.Smaller Details:
Companion PR: https://github.com/wandb/core/pull/23333
This PR finishes up the Ref work to ensure we support any ref the user throws at us.
Context: In a recent PR: https://github.com/wandb/weave/pull/2077, I implemented a more robust parser and checker to ensure Refs and Names conform to expectations. This stopped the "bleeding" where we were storing invalid Refs.
However, our two clients Python and UI had not been updated to reflect the new rules. In fact, the Python client was still creating bad refs, and the UI was overly restrictive with what it thought a ref could be. So double edge sword. As a result, users still hit crashes (either python write time or ui read time).
Moreover, our Ref parsing logic is a mess (especially in the UI) as it is spread across many locations & riddled with legacy junk.
A little terminology note: in PY, URL escaping is called
quoting
and in TS it isencoding
. When used in this PR, they follow the language-specific convention, but logically mean the same concept.Understanding Refs: See https://www.notion.so/wandbai/WIP-Robust-Refs-550bf06f4a244a1ab4fdd26133952e2c for more details.
The important part to understand is the ref format:
Furthermore,
entity
,project
,name
and theedgeValue
s are user-controlled.The main constraints we need to parse these refs are:
entity
cannot contain slashes/
project
cannot contain slashes/
name
cannot contain slashes/
or colons:
edgeValue
cannot contain slashes/
Therefore, we need to basically URL encode/decode each of these parts in all the places that they are used
entity
,project
, nameand
edgeValue`.Deployment The ideal order would be: