p4lang / p4c

P4_16 reference compiler
https://p4.org/
Apache License 2.0
670 stars 441 forks source link

Reduce proliferation of cstrings #4481

Open asl opened 7 months ago

asl commented 7 months ago

cstrings are commonly used as "the string type" in p4c. However, they should be used with more care. The reason is that they are internalized, so each new string is stored forever in some global map.

This makes them great for various containers, both keys and values, for IR strings, etc. However, in many cases cstrings are used just for random transient strings. As a result:

As a result, we might observe some elevated memory usage especially for long-lasting processes. I definitely saw that ~1% of runtime of benchmark from https://github.com/p4lang/p4c/pull/4475 is spent on string hash used for cstring internalization. I have not measured how many strings are in cstring hash, but I'd assume a lot.

Instead, one should resort to std::string and std::string_view for transient values that are not required to be saved forever. Also, we might consider adding some kind of StringMap with string keys that would own keys and not pollute the common cstring cache.

asl commented 7 months ago

Tagging @fruffy

vlstill commented 7 months ago

Instead, one should resort to std::string and std::string_view for transient values that are not required to be saved forever.

I agree there 100 %. We might need to provide some utility functions to make the migration easier. Some of the uses might be there just because some things are easier with cstring then with std::string (or because we relatively recently switched do C++17).

Also, we might consider adding some kind of StringMap with string keys that would own keys and not pollute the common cstring cache.

Could you please give a use-case for that?

asl commented 7 months ago

Could you please give a use-case for that?

Just loud thinking – if one would need to have internalized string keys (e.g. for faster lookup), but definitely would worth starting from the proper usecase in the code

fruffy commented 7 months ago

I agree, I sometimes wonder whether the interning does more harm than good. It might make sense to rename cstring to something like CachedString. Simply to make it explicit to users that this string will stay around. Or use an explicit StringMap, as you suggest.

In the case of P4Testgen, cstrings can work well, because the interpreter ends up allocating and reusing the same kinds of strings a lot. We fixed the majority of cstring leaks there and performed benchmarks running the interpreter for days without running out of memory, but that does not mean there aren't any cstring leaks left.

asl commented 7 months ago

I collected some statistics on downstream project. And the results are overwhelming. Essentially cstring cache works as sink, collecting everyone and their brother: all format strings, pieces of input, json values, ...

Essentially after frontend the cstring cache contained 300k strings (!), midend raised this up to 311k (though we do not run much midend passes). Number of map lookups was great as well totalling 86M after frontend (each cstring ctor involves at least 1 map lookup, or 2 if the value should be stored).

So, lots of hash table lookups, lots of string hashes, etc.

I will try to see if there is some low-hanging fruit here... Likely cstring should be "storage-only" type for node members. But not a replacement for generic string type...

fruffy commented 7 months ago

I will try to see if there is some low-hanging fruit here..

I would guess the newName() method of the ReferenceMap has a lot of weight there. There are a lot of temporary names generated with it.

vlstill commented 4 months ago

I will try to see if there is some low-hanging fruit here..

I would guess the newName() method of the ReferenceMap has a lot of weight there. There are a lot of temporary names generated with it.

I would expect these to mostly end in the IR nodes, which is OK-ish. The problem is that cstring is convenient, in some ways more convenient than std::string so programmers just tend to use it as there is no big hurtle in using it that could prompt them to ask if it is a good idea :-). Error-handling-related code definitely should not use cstring.

One interesting thing could be to (temporarily?) disable[^1] the operator+/operator+= for cstring... this is almost never a good idea to use as there is a high chance that an intermediate value will get cached.

Similarly with to_cstring, join... maybe we could replace them with freestanding functions on string for the stuff that is not already covered over std::string. Stuff like substr, replace, etc. should return std::string (or std::string_view in cases like substr). I'm not sure how much would this break for backends and that will be always hard to know when we don't even know all the 3rd party backends that are based on P4C...

[^1]: Or we could start by making a lot of these functions [[deprecated]] and, clean up our code (possibly with some targetted -Werror in CI), and let the backends decide for themselves. But this cannot be done in cases of changing types or conversions of course.

asl commented 4 months ago

One interesting thing could be to (temporarily?) disable1 the operator+/operator+= for cstring... this is almost never a good idea to use as there is a high chance that an intermediate value will get cached.

Yes, exactly. I already started to look into various cstring-related things. Good news is that currently it is almost always done via std::string. At least in toString methods of various nodes. There are few notable cases, where I fixed this.

There are extreme cases in backends. bpf / ebpf here are the most notable examples where cstring's are created for almost everything for no particular reasons :) Overall I'd expect that this would be the typical case for downstream backends as far as I can see from few (limited) cases ;)

fruffy commented 4 months ago

4694 is a painful commit, it breaks a lot of back ends. But it looks like it is necessary. It shows that there are a lot of unneeded cstring instantiations. For example, why do we need to cache all the options here:

https://github.com/p4lang/p4c/blob/main/frontends/common/parser_options.h#L60

ChrisDodd commented 4 months ago

What is the cost of all these "unnecessary" cstrings? Yes an extra copy occurs when creating one, but that would happen with a std::string too. Yes, they are (currently) not garbage collected, but the total size of all cstrings is trivial compared to the total heap size -- generally less than 0.01% I'd be concerned with doing a lot of work that would have no (or even negative) overall effect.

asl commented 4 months ago

Yes, they are (currently) not garbage collected, but the total size of all cstrings is trivial compared to the total heap size -- generally less than 0.01% I'd be concerned with doing a lot of work that would have no (or even negative) overall effect.

It is not the total size that matters. Every cstring construction involves map lookup. Sometimes two lookups if the string is new and should be copied. Each lookup is hash calculation plus string equality check. The "cstring proliferation problem" is not that visible inside frontend / midend. Though we store bunch of stuff there: we do store the whole input several times (whole input lines, all parsed tokens including all comments and finally parsed identifiers). So, essentially we do use cstrings for something transient, we do use cstrings for string literals a lot in many places, where just string_view will be enough, etc., etc.

I do see cstring::save_to_cache in the profiles and sometimes it is more than just 1%. Also, the less we store, less the GC overhead. As https://github.com/p4lang/p4c/pull/4694#issuecomment-2141066014 shows, just allocating cstring cache in non GC-ed arena yields 3% improvement due to less GC work here. Overall, all these strings is something for GC to scan for pointers and liveness.

The real problem is backends. Backends tend to use cstring as "the P4C string type". And this is just plain wrong. Besides the statistics presented in https://github.com/p4lang/p4c/issues/4481#issuecomment-1985003987 (300k cstrings created in frontend with 86M cstring cache map lookups there), the downstream backend increased the number of strings saved to 2M.

So, the intention is to ensure that it would be hard to create cstring implicitly. Also, these implicit conversions make string_view argument overloads impossible due to ambiguity. And we already obtained the speedup :)