Open asl opened 6 months ago
Tagging @vlstill @fruffy @ChrisDodd for opinions
Personally I think this is a good idea, even though it may be a lot of extra refactoring work. Making this explicit will make people think twice whether they want to use a cstring.
@vlstill also introduced a _cs
suffix at some point which should help here? https://github.com/p4lang/p4c/pull/4342
@vlstill also introduced a
_cs
suffix at some point which should help here? #4342
Yes. _cs
would reduce the amount of typing indeed. As far as I can see, the backends are major abusers of cstrings. ebpf is the nicest example :) This would affect downstream users certainly, though. However, https://github.com/p4lang/p4c/issues/4481#issuecomment-1985003987 clearly shows that there is lots of things here and there. cstring cache contains the whole copy of the input, for example. As all tokens are stored as cstring's.
I like the idea of explicit constructor for cstring
. I think this would be a major undertaking to do, there is a lot of places in the compiler that count on implicit casts. This would be much simpler if there was some tooling support for making changes like this automatic (it almost seems like something that clang-tidy or something similar should be able to do, but I don't know if there is any solution). That would also make it much simpler for the downstream tools.
I've not tried this one I think, but I've tried to make the construction of IR::ID
explicit and that was too much for me. However, without IR::ID
explicit constructor, this one does not make much sense. These implicit casts are defined in cstring
and IR::ID
:
const char *
-> cstring
std::string
-> cstring
std::stringstream
-> cstring
const char *
-> IR::ID
cstring
-> IR::ID
std::string
-> IR::ID
So making cstring
explicit is not enough, as there would still be chain of implicit casts const char *
-> IR::ID
-> cstring
:-(.
I've also checked how the standard library does similar conversions and it seems to be similar case as you propose (with the additional problem that the implicit cast in stdlib can introduce dangling reference but cstring
-> string_view
can't). The std library has:
std::string_view
-> std::string
(constructor)std::string
-> std::string_view
(operator)@vlstill Yeah. That's lots of work. I already started changes here and there. Some observations:
cstring
's created from string literals. Here _cs
is a rescue. Bonus points: this makes the stuff faster as we do not need to copy the string, we just take the address of the literal and put into the map.cstring
as an argument. They should really take string_view
. I'm doing these changes on the way.cstring
itself deserves some love. This includes e.g. cstring::startsWith
/ cstring::endsWith
. Currently they take cstring as an argument, essentially caching them. There is certainly no need for thiscstring
concatenations. Lots of cases. Should definitely require refactoring. I'm putting FIXMEs for nowMaybe something else will occur on my way :)
So, I just moved a downstream backend over the changes in https://github.com/p4lang/p4c/pull/4694
It was not that painful. Important story that it uncovered lots of rusty cruft accumulated over the time:
I would say it was definitely worth the cleanup :)
Some story from dowstream code. It had code like this:
void foo(cstring a, cstring b, ...);
void bar() {
..
std::string s = a->toString().c_str();
if (rare_condition)
s += prefix;
foo(s, s);
}
bar()
is quite hot function for some reason. As std::string
=> cstring
conversion is still implicit we ended with:
So, just changing the code to:
void foo(cstring a, cstring b, ...);
void bar() {
..
cstring s = a->toString();
if (rare_condition)
s += prefix;
foo(s, s);
}
yielded a noticeable speedup as no memory allocations and map lookups were performed on most probable hot path. Even on slow path we only pay single price for cstring construction.
This overhead went unnoticed due to implicit conversions...
Thinking more about it... What if we do the opposite?
string_view
should be implicitconst char*
should be explicitHere is the rationale: currently it is not possible to have a
string_view
function argument if the function is assumed to accept bothcstring
andconst char*
. E.g. the following:Is ambiguous as
cstring
could be constructed fromchar*
implicitly,string_view
could be constructed fromchar*
implicitly, but we cannot drop thecstring
overload ascstring
cannot be implicitly converted tostring_view
.Certainly, there are lots of examples in codebase that do e.g.
that will be broken. But IMO these should be all fixed to
cstring::literal
call and explicit conversion tocstring
should emphasize that this is not a cheap operation (involvingstrlen
+ map lookup)._Originally posted by @asl in https://github.com/p4lang/p4c/pull/4676#discussion_r1617694806_