Closed m0n0chr0m3 closed 4 years ago
Great catch! The behavior you are seeing is definitely unexpected.
I see that the tests in your branch start off from EventOriginUri::parse
method, and that is actually not in use anywhere else right now (unless some non-tremor project picked it up lately π ). From tremor onramps, EventOriginUri
s are initialized directly (example) and in that context, the path method will reflect whatever the onramp sets.
We should fix the parse functionality though. I like the option 3 you proposed -- compared to 2, it gives users the ability to distinguish between "scheme://host/"
and "scheme://host"
for path
, in case they need it. 4 would also achieve the same but 3 is more consistent with how path
is being set from onramps and vec![]
sufficiently handles the None
path case without needing to make it an optional type (4 would also open up Some(vec![])
as a potential value for path
which is one too many option for what we need here).
Assuming we go with option 3, could you also change the Display
implementation of EventOriginUri
to be consistent with it? Right now, it outputs "scheme://host/"
for both path cases (vec![]
and vec![""]
): https://github.com/wayfair-tremor/tremor-runtime/blob/v0.7.3/tremor-script/src/ctx.rs#L86
Thanks for working on adding the tests -- much appreciated!
Thanks for the feedback.
Option 4 is indeed a bit weird, but it's what the url
crate does.
I'll prepare a version based on option 3, with the Display
implementation fixed and verified by the tests.
Follow-up question: should EventOriginUri::default()
result in a URI "tremor-script://localhost/"
or a "tremor-script://localhost"
(i.e., with an empty path)?
It might be worth considering the RFC for URI's when looking at this.
It states:
A path is always defined for a URI, though the defined path may be empty (zero length).
Which rules out the use of Option the way I read it.
Looking further at the grammar:
path = path-abempty ; begins with "/" or is empty
/ path-absolute ; begins with "/" but not "//"
/ path-noscheme ; begins with a non-colon segment
/ path-rootless ; begins with a segment
/ path-empty ; zero characters
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
segment = *pchar
segment-nz = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":"
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
The important bits are
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
segment = *pchar
In my opinion as it answers the question if an empty path is allowed, the differentiation between segment
and segment-nz
indicates to me that yes they en empty path element as the last element of a path si allowed and distinctly different from no /
at the end so
"protocol://the.host.name": None
"protocol://the.host.name/": Some([""])
Are different URI's.
Taking those into account I would say that 3 is the RFC-conform implementation.
Follow-up question: should
EventOriginUri::default()
result in a URI"tremor-script://localhost/"
or a"tremor-script://localhost"
(i.e., with an empty path)?
Since there's no info to convey by setting path here, I would prefer to leave it empty.
@Licenser good call on the rfc!
Indeed, url
's behavior is seemingly at odds with the (proposed) specification and RFC. I didn't feel like second-guessing a dependency without input from the core devs.
Problem The
path
method onEventOriginUri
s behaves different from what I would have expected, and yields different results from what theuri
crate'spath_segments
yields: invokingpath()
on paths of the form"scheme://host.name/"
yields a vector of two empty strings, where I'd expect only one.Steps Consider the behavior at https://github.com/m0n0chr0m3/tremor-runtime/commit/3798d4383a7f8488bbf16042e927ce74ae703dc9#diff-bdd33b2308718bd57f53842f6df9fce6R166. The assertions as written there succeed in the current implementation of
tremor-script
. I'd expect the version in the comment below it. Similarly for the rest of that test case.For reference, the behavior of the
url
crate'spath_segments
method on those inputs is as follows:(as demonstrated on the playground)
As far as I see, none of the unit tests currently check any of the origin namespace Tremor functions. The Tremor test harness for unit tests runs all scripts/pipelines/etc. in a context where the origin is not set, so all functions would return
null
anyway. I hence cannot gauge the impact of 'fixing' thepath
method.Possible Solution(s)
"scheme://host/"
should bevec![""]
, and the path segments for"scheme://host"
should bevec![""]
too, tell me and I'll open a PR."scheme://host/"
should bevec![""]
, and the path segments for"scheme://host"
should bevec![]
, tell me and I'll open a PR."scheme://host/"
should beSome(vec![""])
, and the path segments for"scheme://host"
should beNone
, tell me and I'll open a PR.Notes
Output of
tremor-server --version
: