Closed alejandro-colomar closed 8 months ago
Regarding why this doesn't trigger a vulnerability, I don't have a definitive conclusion, but here's some research:
If it's a constant string,it seems the previous nxt_tstr_t where it had been stored, contained a '\0'
, and we're reusing that memory.
$ grepc nxt_tstr_str .
./src/nxt_tstr.h:void nxt_tstr_str(nxt_tstr_t *tstr, nxt_str_t *str);
./src/nxt_tstr.c:void
nxt_tstr_str(nxt_tstr_t *tstr, nxt_str_t *str)
{
*str = tstr->str;
if (tstr->flags & NXT_TSTR_STRZ) {
str->length--;
}
}
I still don't know about when it's variable; I'll need to investigate that code. Probably it has something similar. If there's a guarantee that all paths have a trailing '\0'
thanks to nxt_tstr_t, then maybe we could keep it, via some new nxt_tstr_strz()
function that wouldn't drop the '\0'
. That would reduce the need to allocate some memory later.
$ grep -rn nxt_tstr_compile -A1 -I
src/nxt_tstr.h:41:nxt_tstr_t *nxt_tstr_compile(nxt_tstr_state_t *state, nxt_str_t *str,
src/nxt_tstr.h-42- nxt_tstr_flags_t flags);
--
src/nxt_http_route.c:700: action->u.tstr = nxt_tstr_compile(rtcf->tstr_state, &pass, 0);
src/nxt_http_route.c-701- if (nxt_slow_path(action->u.tstr == NULL)) {
--
src/nxt_http_route.c:1528: action->u.tstr = nxt_tstr_compile(rtcf->tstr_state, pass, 0);
src/nxt_http_route.c-1529- if (nxt_slow_path(action->u.tstr == NULL)) {
--
src/nxt_http_set_headers.c:60: hv->value = nxt_tstr_compile(rtcf->tstr_state, &str, 0);
src/nxt_http_set_headers.c-61- if (nxt_slow_path(hv->value == NULL)) {
--
src/nxt_http_return.c:60: conf->location = nxt_tstr_compile(rtcf->tstr_state, &str, 0);
src/nxt_http_return.c-61- if (nxt_slow_path(conf->location == NULL)) {
--
src/nxt_http_rewrite.c:19: action->rewrite = nxt_tstr_compile(rtcf->tstr_state, &str, 0);
src/nxt_http_rewrite.c-20- if (nxt_slow_path(action->rewrite == NULL)) {
--
src/nxt_http_static.c:108: tstr = nxt_tstr_compile(rtcf->tstr_state, &str, NXT_TSTR_STRZ);
src/nxt_http_static.c-109- if (nxt_slow_path(tstr == NULL)) {
--
src/nxt_http_static.c:134: conf->chroot = nxt_tstr_compile(rtcf->tstr_state, &acf->chroot,
src/nxt_http_static.c-135- NXT_TSTR_STRZ);
--
src/nxt_tstr.c:84:nxt_tstr_compile(nxt_tstr_state_t *state, nxt_str_t *str,
src/nxt_tstr.c-85- nxt_tstr_flags_t flags)
--
src/nxt_router_access_log.c:128: format = nxt_tstr_compile(rtcf->tstr_state, &str, NXT_TSTR_LOGGING);
src/nxt_router_access_log.c-129- if (nxt_slow_path(format == NULL)) {
It seems that there's guarantee that the nxt_tstr_t will have a '\0'
(NXT_TSTR_STRZ). We can probably reuse that byte without allocating again.
src/nxt_http_static.c:108: tstr = nxt_tstr_compile(rtcf->tstr_state, &str, NXT_TSTR_STRZ);
src/nxt_http_static.c-109- if (nxt_slow_path(tstr == NULL)) {
I still find some code confusing, so this is not definitive, but... it works,,, for now.
Hmm, I've been rumiating this, and maybe it's just a misdocumented feature, not a bug.
So, some nxt_str_t have a hidden terminating NUL, but it's not within its str->length, but right after it, and it's controlled by that flag in the nxt_tstr_t that it comes from. That's useful because you can pass those to code that expects no terminating NUL.
Maybe you could document that, by using a nxt_strz_t typedef, which could even point to the same type, but at least the programmer would have a clue of what's going on. You could even make it a distinct type, so that the compiler can warn about misuses. That gets a bit tricky, because there are functions that should handle both. You could maybe do that with a union:
union nxt_strz_u {
struct (
u_char *start;
size_t length;
} z; // Or optionally make it an anonymous struct.
nxt_str_t s;
};
It might even have some performance and simplicity wins, because you wouldn't need a run-time flag to say if you want a NUL or not.
And it clearly distinguishes when you're accessing a normal nxt_str_t, and when you're using an actual (terminated) string.
In this case, @hongzhidao , I'll say sorry to you, because you were right that it's not necessarily a bug. I was just a bit irascible with you due to the last discussions you and I had a few months ago.
v2 changes:
$ git range-diff master gh/NUL NUL
1: 7e2f93e6 < -: -------- Static: Store the full path name in a nxt_str_t
2: a5a30451 < -: -------- Static: Don't include the terminating null byte in exten
3: 5c17aea0 < -: -------- Static: Unconditionally extract the extension
4: 2fdde477 < -: -------- Static: Terminate string with a null byte
-: -------- > 1: 2e963592 Tstr: Unconditionally append a hidden NUL
Also, all callers can rely on these string being terminated, without
I guess that should either read
these strings
or
this string
Completely different approach: guarantee a NUL in all nxt_tstr_t, so shr, which comes from it, has a NUL for sure.
Thanks for the patch, but the current way is good and we don't plan to touch it in another way if there is no issue with it.
v2b changes:
$ git range-diff master 2e963592 NUL
1: 2e963592 ! 1: fa987bf9 Tstr: Unconditionally append a hidden NUL
@@ Commit message
Allocating one more byte shouldn't be a significant problem of memory
use. On the other hand, this reduces a few branches, simplifying code.
- Also, all callers can rely on these string being terminated, without
+ Also, all callers can rely on these strings being terminated, without
having to rely on a flag that may be set very far away.
+ Cc: Andrew Clayton <a.clayton@nginx.com>
+ Cc: Zhidao Hong <z.hong@f5.com>
+ Cc: Andrei Zeliankou <zelenkov@nginx.com>
+ Cc: "Valentin V. Bartenev" <vbartenev@gmail.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
## src/nxt_http_static.c ##
Thanks for the patch, but the current way is good and we don't plan to touch it in another way if there is no issue with it.
You're welcome!
I find it very confusing as it is now, but it's your code now, so as you wish.
v3 changes:
$ git range-diff master gh/NUL NUL
-: -------- > 1: da75ae94 HTTP: Add const to some functions
1: fa987bf9 = 2: 6cf66c2a Tstr: Unconditionally append a hidden NUL
-: -------- > 3: f580062a Tstr: Produce a nxt_strz_t from nxt_tstr_t
Taken in isolation, I find it hard to be confident that I understand the full intended scope of this patch series.
Would it be correct to say that:
nxt_strz_t
, which denotes null-terminationTaken in isolation, I find it hard to be confident that I understand the full intended scope of this patch series.
Would it be correct to say that:
1. This does not fix any known bugs 2. The primary benefit of these patches is greater code clarity by reducing branching and introducing a more nuanced type, `nxt_strz_t`, which denotes null-termination 3. In exchange for reduced branching, we take on a small bit of allocation overhead
Very correct. :)
And 4. The compiler can check that nxt_str_t and nxt_strz_t aren't mixed by accident.
v3b changes:
$ git range-diff master f580062a NUL
1: da75ae94 = 1: da75ae94 HTTP: Add const to some functions
2: 6cf66c2a = 2: 6cf66c2a Tstr: Unconditionally append a hidden NUL
3: f580062a ! 3: 9866a7a8 Tstr: Produce a nxt_strz_t from nxt_tstr_t
@@ Metadata
## Commit message ##
Tstr: Produce a nxt_strz_t from nxt_tstr_t
- In the previous commit, we made sure all nxt_tstr_t have a hidden in the
- produced nxt_str_t.
+ In the previous commit, we made sure all nxt_tstr_t have a hidden NUL at
+ the end of the produced nxt_str_t (hidden because the .length field
+ doesn't include it).
We can make that a different type, nxt_strz_t, which is distinct from
nxt_str_t. This will allow the compiler enforce correct use of these
- strings, via type safety warnings and errors.
+ strings, via type-safety warnings and errors.
nxt_strz_t can be read as a nxt_str_t, for compatibility in functions
that want a nxt_str_t. This access is through a const member of a union
@@ src/nxt_http_return.c: nxt_http_return_init(nxt_router_conf_t *rtcf, nxt_http_ac
}
return NXT_OK;
+@@ src/nxt_http_return.c: nxt_http_return(nxt_task_t *task, nxt_http_request_t *r,
+ conf = action->u.conf;
+
+ #if (NXT_DEBUG)
+- nxt_str_t loc;
++ nxt_strz_t loc;
+
+ if (conf->location == NULL) {
+ nxt_str_set(&loc, "");
+@@ src/nxt_http_return.c: nxt_http_return(nxt_task_t *task, nxt_http_request_t *r,
+ nxt_tstr_str(conf->location, &loc);
+ }
+
+- nxt_debug(task, "http return: %d (loc: \"%V\")", conf->status, &loc);
++ nxt_debug(task, "http return: %d (loc: \"%V\")", conf->status, &loc.s);
+ #endif
+
+ if (conf->status >= NXT_HTTP_BAD_REQUEST
@@ src/nxt_http_return.c: nxt_http_return_send_ready(nxt_task_t *task, void *obj, void *data)
if (ctx != NULL) {
if (ctx->location.length > 0) {
@@ src/nxt_http_static.c: nxt_http_static_init(nxt_task_t *task, nxt_router_temp_co
conf->chroot = nxt_tstr_compile(rtcf->tstr_state, &acf->chroot, 0);
if (nxt_slow_path(conf->chroot == NULL)) {
+@@ src/nxt_http_static.c: nxt_http_static_iterate(nxt_task_t *task, nxt_http_request_t *r,
+ share = &conf->shares[ctx->share_idx];
+
+ #if (NXT_DEBUG)
+- nxt_str_t shr;
+- nxt_str_t idx;
++ nxt_str_t idx;
++ nxt_strz_t shr;
+
+ nxt_tstr_str(share->tstr, &shr);
+ idx = conf->index;
+
+ #if (NXT_HAVE_OPENAT2)
+- nxt_str_t chr;
++ nxt_strz_t chr;
+
+ if (conf->chroot != NULL) {
+ nxt_tstr_str(conf->chroot, &chr);
+@@ src/nxt_http_static.c: nxt_http_static_iterate(nxt_task_t *task, nxt_http_request_t *r,
+ }
+
+ nxt_debug(task, "http static: \"%V\", index: \"%V\" (chroot: \"%V\")",
+- &shr, &idx, &chr);
++ &shr.s, &idx, &chr.s);
+ #else
+- nxt_debug(task, "http static: \"%V\", index: \"%V\"", &shr, &idx);
++ nxt_debug(task, "http static: \"%V\", index: \"%V\"", &shr.s, &idx);
+ #endif
+ #endif /* NXT_DEBUG */
+
@@ src/nxt_http_static.c: nxt_http_static_send_ready(nxt_task_t *task, void *obj, void *data)
struct tm tm;
nxt_buf_t *fb;
@@ src/nxt_tstr.c: nxt_tstr_query_init(nxt_tstr_query_t **query_p, nxt_tstr_state_t
{
nxt_int_t ret;
+@@ src/nxt_tstr.c: nxt_tstr_query(nxt_task_t *task, nxt_tstr_query_t *query, nxt_tstr_t *tstr,
+ val->length--; // Hide the terminating NUL.
+
+ #if (NXT_DEBUG)
+- nxt_str_t str;
++ nxt_strz_t str;
+
+ nxt_tstr_str(tstr, &str);
+
+- nxt_debug(task, "tstr query: \"%V\", result: \"%V\"", &str, val);
++ nxt_debug(task, "tstr query: \"%V\", result: \"%V\"", &str.s, val);
+ #endif
+ }
+
## src/nxt_tstr.h ##
@@ src/nxt_tstr.h: nxt_int_t nxt_tstr_state_done(nxt_tstr_state_t *state, u_char *error);
v3c:
$ git range-diff master gh/NUL NUL
1: da75ae94 ! 1: 99921de5 HTTP: Add const to some functions
@@ Commit message
Cc: Zhidao Hong <z.hong@f5.com>
Cc: Andrei Zeliankou <zelenkov@nginx.com>
Cc: "Valentin V. Bartenev" <vbartenev@gmail.com>
+ Cc: Dan Callahan <d.callahan@f5.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
## src/nxt_http.h ##
2: 6cf66c2a ! 2: 2faae7c9 Tstr: Unconditionally append a hidden NUL
@@ Commit message
Cc: Zhidao Hong <z.hong@f5.com>
Cc: Andrei Zeliankou <zelenkov@nginx.com>
Cc: "Valentin V. Bartenev" <vbartenev@gmail.com>
+ Cc: Dan Callahan <d.callahan@f5.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
## src/nxt_http_static.c ##
3: 9866a7a8 ! 3: d86530e1 Tstr: Produce a nxt_strz_t from nxt_tstr_t
@@ Commit message
Cc: Zhidao Hong <z.hong@f5.com>
Cc: Andrei Zeliankou <zelenkov@nginx.com>
Cc: "Valentin V. Bartenev" <vbartenev@gmail.com>
+ Cc: Dan Callahan <d.callahan@f5.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
## src/nxt_http.h ##
On Wed, 24 Jan 2024 12:17:11 -0800 Alejandro Colomar @.***> wrote:
3. In exchange for reduced branching, we take on a small bit of allocation overhead
Very correct. :)
Although glibc malloc(3) at least will allocate memory in chunks so you may well get back more memory than you actually asked for, e.g
8 [24] 18 [24] 25 [40] 36 [40] 42 [56]
First column is number of bytes I asked malloc(3) for, second column is number of usable bytes I actually got back as told by malloc_usable_size(3).
On Wed, Jan 24, 2024 at 12:50:18PM -0800, Andrew Clayton wrote:
On Wed, 24 Jan 2024 12:17:11 -0800 Alejandro Colomar @.***> wrote:
3. In exchange for reduced branching, we take on a small bit of allocation overhead
Very correct. :)
Although glibc malloc(3) at least will allocate memory in chunks so you may well get back more memory than you actually asked for, e.g
8 [24] 18 [24] 25 [40] 36 [40] 42 [56]
First column is number of bytes I asked malloc(3) for, second column is number of usable bytes I actually got back as told by malloc_usable_size(3).
I had that in mind, but one could argue that the +1 may trigger the next chunk. But yeah, overall, this is negligible by any means.
-- https://www.alejandro-colomar.es/ Looking for a remote C programming job at the moment.
In
Tstr: Produce a nxt_strz_t from nxt_tstr_t
nxt_str_t. This will allow the compiler enforce correct use of these
should be
nxt_str_t. This will allow the compiler to enforce correct use of
In
Tstr: Produce a nxt_strz_t from nxt_tstr_t
nxt_str_t. This will allow the compiler enforce correct use of these
should be
nxt_str_t. This will allow the compiler to enforce correct use of
Thanks!
v3d changes:
$ git range-diff master gh/NUL NUL
1: 99921de5 = 1: 99921de5 HTTP: Add const to some functions
2: 2faae7c9 = 2: 2faae7c9 Tstr: Unconditionally append a hidden NUL
3: d86530e1 ! 3: 669bfb88 Tstr: Produce a nxt_strz_t from nxt_tstr_t
@@ Commit message
doesn't include it).
We can make that a different type, nxt_strz_t, which is distinct from
- nxt_str_t. This will allow the compiler enforce correct use of these
- strings, via type-safety warnings and errors.
+ nxt_str_t. This will allow the compiler to enforce correct use of these
+ strings, via type-safety diagnostics.
nxt_strz_t can be read as a nxt_str_t, for compatibility in functions
that want a nxt_str_t. This access is through a const member of a union
EDIT 1: Ignore for now
EDIT 2: Perhaps requires njs to trigger...
Heh, had tried running it through BB, but it failed to compile across the board...
make: Target 'all' not remade because of errors.
cc -c -pipe -fPIC -fvisibility=hidden -O -W -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wmissing-prototypes -Werror -g -I njs/src -I njs/build -I src -I build/include \
\
\
-o build/src/nxt_tstr.o \
-MMD -MF build/src/nxt_tstr.dep -MT build/src/nxt_tstr.o \
src/nxt_tstr.c
src/nxt_tstr.c: In function ‘nxt_tstr_compile’:
src/nxt_tstr.c:115:28: error: passing argument 2 of ‘nxt_tstr_str’ from incompatible pointer type [-Werror=incompatible-pointer-types]
115 | nxt_tstr_str(tstr, &tpl);
| ^~~~
| |
| nxt_str_t *
In file included from src/nxt_main.h:74,
from src/nxt_tstr.c:6:
src/nxt_tstr.h:47:49: note: expected ‘nxt_strz_t *’ but argument is of type ‘nxt_str_t *’
47 | void nxt_tstr_str(nxt_tstr_t *tstr, nxt_strz_t *str);
| ~~~~~~~~~~~~^~~
src/nxt_tstr.c: In function ‘nxt_tstr_query’:
src/nxt_tstr.c:268:39: error: passing argument 5 of ‘nxt_js_call’ from incompatible pointer type [-Werror=incompatible-pointer-types]
268 | tstr->u.js, val, query->ctx);
| ^~~
| |
| nxt_strz_t *
In file included from src/nxt_tstr.h:10:
src/nxt_js.h:35:53: note: expected ‘nxt_str_t *’ but argument is of type ‘nxt_strz_t *’
35 | nxt_js_cache_t *cache, nxt_js_t *js, nxt_str_t *str, void *ctx);
| ~~~~~~~~~~~^~~
cc1: all warnings being treated as errors
make: *** [build/Makefile:531: build/src/nxt_tstr.o] Error 1
program finished with exit code 2
Hmm, it builds OK locally, maybe a unit-try snafu... let me check...
About the difference between the current and new way.
current way:
nxt_tstr_t *tstr;
tstr = nxt_tstr_compile(&some, NXT_TSTR_STRZ);
nxt_str str;
nxt_tstr_query(tstr, &str);
/* use str directly */
str.start,
str.length,
...
new way:
nxt_tstr_t *tstr;
tstr = nxt_tstr_compile(&some, 0);
nxt_strz_t str;
nxt_tstr_str(tstr, &str);
/* use str with .s */
str.s.start,
str.s.length,
The new way get rid of a flag and three branches, like below:
strz = (flags & NXT_TSTR_STRZ) != 0;
if (strz) {
*p = '\0';
}
if (tstr->flags & NXT_TSTR_STRZ) {
val->length--;
}
But it introduced a new structure.
typedef union {
struct {
size_t length;
u_char *start;
};
const nxt_str_t s;
} nxt_strz_t;
And " we take on a small bit of allocation overhead"
It introduced more unclear enough namings, for example, the rewrite needs a string without a zero ending.
But all of the related ones will have to use nxt_strz_t
instead. Then they have to learn the new structure.
If they want to use the query result, they also have to use str.s. Here there are str
and s
, both of them look like string
.
If it makes sense to do a similar change of getting rid of simple branches, I believe there are more than hundreds of changes we can do, it depends on how much time we can spend on checking the source code.
"This does not fix any known bugs" It looks not a good idea to make the change.
And " we take on a small bit of allocation overhead"
That will only happen (with glibc malloc(3) at least), if we allocate an extra byte when on a chunk boundary.
"This does not fix any known bugs" It looks not a good idea to make the change.
As has been said many many times before, it's not always about fixing bugs. It's about keeping the code base alive, stopping bit-rot, making the code cleaner and easier to maintain, reducing the scope for introducing bugs etc etc.
I think perhaps you need to expose yourself to some projects that are external to F5... (maybe you do, but I see no evidence of it).
But the new way looks no more concise as I listed the difference.
Hmm, it builds OK locally, maybe a unit-try snafu... let me check...
Either unit-try is fubar or this requires njs to trigger... which I don't have which is why perhaps it compiles for me.
That will only happen (with glibc malloc(3) at least), if we allocate an extra byte when on a chunk boundary.
It doesn't reduce the allocation, I can't understand why we need to introduce something that might increase memory.
As has been said many many times before, it's not always about fixing bugs. It's about keeping the code base alive, stopping bit-rot, making the code cleaner and easier to maintain, reducing the scope for introducing bugs etc etc.
I agree with the thought. On the contrary, the new approach adds complexity.
making the code cleaner
The new way Introduced a more heavier structure.
easier to maintain
Developers have to learn more structures and uses with the new way.
But the new way looks no more concise as I listed the difference.
I like the overall concept and it does result in a little less code, which would help cache footprint, (with one of my suggested changes could be further reduced)
$ git diff --stat --summary 02d1984c912261a1274534a24a2d94ac7c7abfa7..
src/nxt_http.h | 4 ++--
src/nxt_http_return.c | 13 +++++++------
src/nxt_http_rewrite.c | 3 ++-
src/nxt_http_route.c | 12 ++++++------
src/nxt_http_set_headers.c | 2 +-
src/nxt_http_static.c | 36 ++++++++++++++++++------------------
src/nxt_js.c | 16 +++-------------
src/nxt_js.h | 2 +-
src/nxt_router_access_log.c | 2 +-
src/nxt_string.h | 10 ++++++++++
src/nxt_tstr.c | 33 +++++++++++----------------------
src/nxt_tstr.h | 5 ++---
src/nxt_var.c | 4 ++--
src/nxt_var.h | 4 ++--
14 files changed, 68 insertions(+), 78 deletions(-)
As has been said many many times before, it's not always about fixing bugs. It's about keeping the code base alive, stopping bit-rot, making the code cleaner and easier to maintain, reducing the scope for introducing bugs etc etc.
I agree with the thought. On the contrary, the new approach adds complexity.
I'm not sure I agree...
making the code cleaner
The new way Introduced a more heavier structure.
Keep in mind it's a union... I like it.
easier to maintain
Developers have to learn more structures and uses with the new way.
Well, yes, otherwise you can just stick with what you have, stagnate and eventually fade away into irrelevance.
But I fear we are just ending up in one of these endless cyclical arguments...
Here are my thoughts:
Here are my thoughts: 1, it's clear that there are no issues here. 2. The current implementation has no obvious bad code, let me know if you still think it has. 3. The new way get rid of three simple branches, it doesn't mean the current code is worth being change to another way. 4. It introduced a more heavier structure, it looks a bad idea. 5. The new usage is no easier than the current way.
I don't think 1 and 2 are relevant here and make a good strawman argument. But I think it does improve the code, in terms of readability and robustness.
For 3, I'm meh.
For 4, it's a union, what do you think is heavy about it? It has some nice properties as Alex noted.
5 is subjective. If it can eliminate a potential class of bugs then that could be construed as easier to use.
Could we discuss it based on the two points? simplicity and clarity.
simplicity
# current
nxt_tstr_t *tstr;
tstr = nxt_tstr_compile(&some, NXT_TSTR_STRZ);
nxt_str str;
nxt_tstr_query(tstr, &str);
/* use str directly */
str.start,
str.length,
...
# new
nxt_tstr_t *tstr;
tstr = nxt_tstr_compile(&some, 0);
nxt_strz_t str;
nxt_tstr_str(tstr, &str);
/* use str with .s */
str.s.start,
str.s.length,
The new way doesn't make the usage easier, right? Maybe he thought it also doesn't make it complicated.
clarity
The new one is full of strz, str, s
, etc. It doesn't make it more clear.
EDIT 2: Perhaps requires njs to trigger...
It's that. I was looking at the code again, and saw that some NJS code needs to be updated as well. I now need to remind myself of how to build with NJS...
If anyone knows why I'm getting this error, please help.
$ ./configure --debug --njs --openssl --cc-opt="-I../../njs/master/src/ -I../../njs/master/build/" --ld-opt="-L../../njs/master/build/"
...
checking for NJS ... not found
./configure: error: no NJS library >= 0.8.0 found.
$ tail -n20 build/autoconf.err
#include <njs.h>
#if NJS_VERSION_NUMBER < 0x000800
# error NJS < 0.8.0 is not supported.
#endif
int main(void) {
njs_vm_t *vm;
njs_vm_opt_t opts;
njs_vm_opt_init(&opts);
vm = njs_vm_create(&opts);
if (vm == NULL)
return 1;
return 0;
}
----------
cc -pipe -fPIC -fvisibility=hidden -O -W -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wmissing-prototypes -Werror -g -I../../njs/master/src/ -I../../njs/master/build/ -o build/autotest build/autotest.c -lnjs -lm -lssl -lcrypto \ \ -lpcre2-8 -L../../njs/master/build/
----------
$ grepc NJS_VERSION_NUMBER ../../njs/master/
../../njs/master/src/njs.h:#define NJS_VERSION_NUMBER 0x000803
../../njs/master/build/Makefile:#define NJS_VERSION_NUMBER 0x000803
v4 changes:
$ git range-diff master gh/NUL NUL
1: 99921de5 = 1: 99921de5 HTTP: Add const to some functions
2: 2faae7c9 = 2: 2faae7c9 Tstr: Unconditionally append a hidden NUL
-: -------- > 3: b57dbdb6 amend
3: 669bfb88 ! 4: 8243bbbd Tstr: Produce a nxt_strz_t from nxt_tstr_t
@@ src/nxt_tstr.c: nxt_tstr_is_const(nxt_tstr_t *tstr)
+nxt_tstr_str(nxt_tstr_t *tstr, nxt_strz_t *str)
{
- *str = tstr->str;
--
-- str->length--; // Hide the terminating NUL.
-+ str->length = tstr->str.length - 1; // Hide the terminating NUL.
++ str->length = tstr->str.length;
+ str->start = tstr->str.start;
}
@@ src/nxt_tstr.c: nxt_tstr_query_init(nxt_tstr_query_t **query_p, nxt_tstr_state_t
nxt_int_t ret;
@@ src/nxt_tstr.c: nxt_tstr_query(nxt_task_t *task, nxt_tstr_query_t *query, nxt_tstr_t *tstr,
- val->length--; // Hide the terminating NUL.
+ }
#if (NXT_DEBUG)
- nxt_str_t str;
v5 changes:
$ git range-diff master gh/NUL NUL
1: 99921de5 < -: -------- HTTP: Add const to some functions
-: -------- > 1: 59e9a168 Add const to some functions
2: 2faae7c9 = 2: b2d94c12 Tstr: Unconditionally append a hidden NUL
3: b57dbdb6 = 3: 06044856 amend
4: 8243bbbd ! 4: 6311e764 Tstr: Produce a nxt_strz_t from nxt_tstr_t
@@ src/nxt_http_static.c: nxt_http_static_send_ready(nxt_task_t *task, void *obj, v
+ nxt_strz_t *shr;
nxt_uint_t level;
nxt_file_t *f, file;
- nxt_file_info_t fi;
+ const u_char *fname;
@@ src/nxt_http_static.c: nxt_http_static_send_ready(nxt_task_t *task, void *obj, void *data)
nxt_str_null(&exten);
@@ src/nxt_http_static.c: nxt_http_static_send_ready(nxt_task_t *task, void *obj, v
if (chr->length > 0) {
nxt_log(task, level, "opening \"%s\" at \"%V\" failed %E",
+- fname, chr, file.error);
++ fname, &chr->s, file.error);
+
+ } else {
+ nxt_log(task, level, "opening \"%s\" failed %E",
@@ src/nxt_http_static.c: nxt_http_static_send_ready(nxt_task_t *task, void *obj, void *data)
- p;
-: -------- > 5: cc0a2a5b String: Add some more safety to nxt_strz_t
Hi @alejandro-colomar
If you have an actual example of this
This will allow the compiler to enforce correct use of these strings, via type-safety diagnostics.
that may help grease the wheels...
I think you said you hit some such case?
Although having glanced at the latest changes, I'm a little worried at the feature creep if you will...
Hi Alex,
Could you clarify something for me? I've probably missed this someplace...
Tstr: Produce a nxt_strz_t from nxt_tstr_t
In the previous commit, we made sure all nxt_tstr_t have a hidden NUL at
the end of the produced nxt_str_t (hidden because the .length field
doesn't include it).
We can make that a different type, nxt_strz_t, which is distinct from
nxt_str_t. This will allow the compiler to enforce correct use of these
strings, via type-safety diagnostics.
nxt_strz_t can be read as a nxt_str_t, for compatibility in functions
that want a nxt_str_t. This access is through a const member of a union
to ensure that the terminated string is not corrupted.
And we have (I'm ignoring the latest commit for now)
typedef union {
struct {
size_t length;
u_char *start;
};
const nxt_str_t s;
} nxt_strz_t;
So just to check, if you want to access this as a C string, i.e nul terminated, e.g to pass into a C library function, you'd use thing.start
?
For apis that want a nxt_str_t you'd use thing.s
?
I wonder if s
would be better named nxtstr
or somesuch?
Perhaps even crazier, how badly would things break if we changed the start
member to be
char *cstr;
I'll just note an observation (probably not relevant and probably not news to anyone) that some nxt_str_t's are already nul terminated.
Enough rambling for now.....
Hi @alejandro-colomar
If you have an actual example of this
This will allow the compiler to enforce correct use of these strings, via type-safety diagnostics.
that may help grease the wheels...
I think you said you hit some such case?
Although having glanced at the latest changes, I'm a little worried at the feature creep if you will...
Hmm, this found some holes in the idea.
$ cat str.c
#include <stddef.h>
typedef struct str_s str_t;
typedef union strz_u strz_t;
struct str_s {
size_t len;
char *start;
};
union strz_u {
struct {
size_t len;
char *start;
} z;
struct {
const size_t zlen;
const char *const zstart;
};
const str_t s;
};
int
main(void)
{
char *p = NULL;
str_t *str;
strz_t strz = {};
const char *c_p = NULL;
const str_t *c_str;
//str = &strz; // -Wincompatible-pointer-types
//str = &strz.s; // -Wdiscarded-qualifiers
//str = &strz.z; // -Wincompatible-pointer-types
p = strz.z.start; // Live pointer
//p = strz.zstart; // -Wdiscarded-qualifiers
p = strz.s.start; // Live pointer; don't do this!
//c_str = &strz; // -Wincompatible-pointer-types
c_str = &strz.s; // Safe
//c_str = &strz.z; // -Wincompatible-pointer-types
p = c_str->start; // Hmmm, this is a hole in my plan
c_p = c_str->start; // Safe
c_p = strz.z.start; // Hmmm, this is a hole in my plan
c_p = strz.zstart; // Safe
c_p = strz.s.start; // Safe; but don't do this!
}
$ cc -Wall -Wextra str.c -Wno-unused-variable -Wno-unused-but-set-variable
$
The problem is that const nxt_str_t *
has a non-const
.start
. There's no way to add const
correctness to that type, without adding yet more types.
The following I think should be enough for having 4 distinct types:
.s
), which have a distinguisher: .w
.s
).r
), which have a distinguisher: .z
$ cat str.c
#include <stddef.h>
#include <sys/types.h>
typedef struct rstr_s rstr_t;
typedef union str_u str_t;
typedef union rstrz_u rstrz_t;
typedef union strz_u strz_t;
struct rstr_s {
const size_t len;
const u_char *const start;
};
union str_u {
struct wstr_s {
size_t len;
u_char *start;
} w;
const rstr_t s;
};
union rstrz_u {
struct {
const size_t len;
union {
const u_char *const start;
const char *const cstrz;
};
};
const rstr_t s;
};
union strz_u {
struct {
size_t len;
union {
u_char *start;
char *cstrz;
};
} z;
const rstrz_t r;
};
int
main(void)
{
str_t *str;
strz_t strz = {};
u_char *p = NULL;
rstr_t *rstr;
const rstr_t *c_rstr;
const u_char *rp = NULL;
str = &strz; // -W
str = &strz.r; // -W
str = &strz.z; // -W
p = strz.z.start; // Live pointer
p = strz.r.start; // -W
rstr = &strz; // -W
rstr = &strz.r.s; // -W
c_rstr = &strz.r.s; // Safe ro nxt string
rstr = &strz.z; // -W
p = rstr->start; // -W
rp = rstr->start; // Safe ro nxt string (but forgot the length)
rp = strz.z.start; // Don't do this
rp = strz.r.start; // Safe ro C string
rp = strz.r.s.start; // Safe ro nxt string (but forgot the length)
}
The above, however, would be a more in-depth change, so I guess we better forget about it for now.
Hi Andrew,
Perhaps even crazier, how badly would things break if we changed the
start
member to bechar *cstr;
Not crazier. Actually that was something I was playing with. I had added another union member so that there was u_char *start
alongside a char *cstrz
, to avoid casting.
I dropped that, because in all of the cases I found, I didn't find any cast. They were all being passed to u_char* that were later passed to some wrapper...
But yeah, that cstrz member would be useful if we find places where it removes a cast.
We would still need the .start, to be able to handle it in our own code.
I'll just note an observation (probably not relevant and probably not news to anyone) that some nxt_str_t's are already nul terminated.
Enough rambling for now.....
I'll close this PR, and let this rumiate in my head some time. Such a big change needs some more thought. :)
I liked the overall idea of having our strings nul-terminated.
As you say it perhaps needs more thought and a more fundamental change in approach, so that our basic string type nxt_str_t could satisfy all requirements...
@ac000
After some more research, it may be that type safety is not possible. There's always a way to beat it. It's more or less the same flaw as with const T **
and T **
. Let me try a few more things...
Also, having all nxt_str_t be terminated is impossible, since some of them are taken from prefixes of other strings.
There's one of the things which we can do: have more nxt_str_t be terminated. We could make it so that every dup'd nxt_str_t, or any nxt_str_t derived from a nxt_tstr_t be terminated.
This would have the consequences mentioned before: slight (negligible) memory overhead, and less branching.
v1 (compared to https://github.com/nginx/unit/pull/1080 v8):