Flamefire commented 4 years ago

As the tracer should keep its own impact as low as possible, keeping an eye out for unnecessary memory allocations is vital.

Example: https://github.com/score-p/scorep_binding_python/blob/f9fbd7d932f945cac2b679fcd24cc8771b40f211/src/scorep.cpp#L23 called from https://github.com/score-p/scorep_binding_python/blob/f9fbd7d932f945cac2b679fcd24cc8771b40f211/src/scorep.cpp#L167

module and file_name are initially NULL-terminated C-Strings
For calling the function they are converted to std::strings very likely allocation heap memory and copying the contents
both are only used in the first region enter, so in most cases the above cost is wasted
file_name is then never used as a std::string but converted back to a C-String: https://github.com/score-p/scorep_binding_python/blob/f9fbd7d932f945cac2b679fcd24cc8771b40f211/src/scorep.cpp#L33 So all work is wasted completely
From module only a substring (prefix) is used (BTW: using substr instead of the ctor would likely read easier) Using resize instead would avoid an allocation and copy and using the find and substr on the original C-String would make it even cheaper

I'm aware this is a Python tracer so relative performance impact is not as large as for e.g. C++ programs but especially allocations can have a noticeable impact. So as un-C++ it is, I'd recommend using raw C-Strings for those 2 and similar for other functions

AndreasGocht commented 4 years ago

Even worse: Score-P copys the C-String again 😉 . However, I am not a fan of mixing C-String with C++ Strings. And sticking to C only would mean to implement a hashtable, what is even more problematic in terms of performance.

Moreover, if you are worried about the performance, #5 should be addressed first. Approx 85% of the code are written in python. Each function enter or exit causes a lot of python action, see https://github.com/score-p/scorep_binding_python/blob/f9fbd7d932f945cac2b679fcd24cc8771b40f211/scorep/instrumenters/scorep_profile.py#L76 ff.

I think I'll reconsider this issue once #5 is fixed. However, I'll leave it open in the meantime. If you think it is necessary to fix this earlier, you are welcome to do a PR respecting the above-stated concerns.

Best,

Andreas

Flamefire commented 4 years ago

However, I am not a fan of mixing C-String with C++ Strings. And sticking to C only would mean to implement a hashtable, what is even more problematic in terms of performance.

I see nothing wrong with using what the language offers. C-Strings are part of C++ and there are C++ standard library functions for dealing with them i.e. the entire header (which happen to be the same available in C but in std::, so "blessed") So using a std::string although you actually need a C-Cstring just for the sake of good looks would be the wrong reasoning IMO.

I think I'll reconsider this issue once #5 is fixed. However, I'll leave it open in the meantime.

Sure. I've just seen this and wanted to document it so it doesn't get forgotten.

AndreasGocht commented 4 years ago

I see nothing wrong with using what the language offers. ...

The actual solution seems to be string_view:

https://en.cppreference.com/w/cpp/string/basic_string_view

together with a try_emplace to a std::unorderd_map<string>.

However, this would require C++17.

Best,

Andreas

Flamefire commented 4 years ago

Not really:

A typical implementation holds only two members: a pointer to constant CharT and a size.

So you'll at least pay for iterating over the string once to determine the size. So only keeping the C-String until you need a std::string is the least wasteful solution.

Edit: Additionally there is no c_str() to get back the C-String you need

AndreasGocht commented 4 years ago

Edit: Additionally there is no c_str() to get back the C-String you need

but a data()

Flamefire commented 4 years ago

but a data()

... which is not guaranteed to contain a NULL terminated string.

AndreasGocht commented 4 years ago

Which is never guaranteed with an array. But as it is only a view to a NUL terminated string, thats not an issue.

BTW more fun is this:

s# (str, read-only bytes-like object) [const char *, int or Py_ssize_t]

Like s*, except that it doesn’t accept mutable objects. The result is stored into two C variables, the first one a pointer to a C string, the second one its length. The string may contain embedded null > bytes. Unicode objects are converted to C strings using 'utf-8' encoding.

python doc

then you'll get the length for free. Moreover, it's more flexible.

Best, Andreas

Flamefire commented 4 years ago

Which is never guaranteed with an array.

Er what? A const char* is (usually) a C-String which is a NULL-terminated sequence of chars and what is needed to pass to Score-P

But as it is only a view to a NUL terminated string, thats not an issue.

So you want to store your C-String in a struct storing a pointer and size, then pass that to a function, in that function expect the pointer to be a C-String and ignore the size using that struct effectively as a convoluted way of transferring a C-String.

Before this gets more spammy I'll stop. Important info is in first post and https://github.com/score-p/scorep_binding_python/issues/67#issuecomment-627164640.

AndreasGocht commented 4 years ago

Technical remark:

https://github.com/python/cpython/blob/42bae3a3d9d79f28e6b3b619bd27296d125c4c2c/Python/getargs.c#L989

The code for s#

    else if (PyUnicode_Check(arg)) {
                Py_ssize_t len;
                sarg = PyUnicode_AsUTF8AndSize(arg, &len);
                if (sarg == NULL)
                    return converterr(CONV_UNICODE,
                                      arg, msgbuf, bufsize);
                *p = sarg;
                STORE_SIZE(len);
            }

And the code for s

            else if (PyUnicode_Check(arg)) {
                sarg = PyUnicode_AsUTF8AndSize(arg, &len);
                if (sarg == NULL)
                    return converterr(CONV_UNICODE,
                                      arg, msgbuf, bufsize);
                if (strlen(sarg) != (size_t)len) {
                    PyErr_SetString(PyExc_ValueError, "embedded null character");
                    RETURN_ERR_OCCURRED;
                }
                *p = sarg;
            }

So s does an additional comparison of the returned String to be sure, that there is no NUL character in UTF8 coded Strings. We should be able to avoid this at least for function and module names, as python does not allow NUL as part of Identifiers (see https://www.python.org/dev/peps/pep-3131/#specification-of-language-changes and https://docs.python.org/3.7/reference/lexical_analysis.html#identifiers )

AndreasGocht commented 4 years ago

The same applies to filenames on UNIX, (see https://unix.stackexchange.com/questions/38055/utf-8-filenames ). So we could avoid this as well.

AndreasGocht commented 4 years ago

42

Flamefire commented 4 years ago

My proposal would be to use a simple, slim wrapper around a C-String:

class CString{
  CString(const char* s, size_t len); // Could be guarded by assert for tests
  CString(const char[] s); // With template to deduce size from raw strings (trivial)
  bool operator==(const CString&);
  bool operator!=(const CString&);
  size_t find(char) const;
  const char* c_str() const;
  size_t size() const;
}

This is very similar to string_view but focuses on/guarantees NULL-terminated strings. The implementation is incredibly simple so verifiable by just looking at it. Nothing really happening there. And the comparison operators save call sites from using the (IMO strange) C string comparison functions. Some interop functions with std::string could be added too as needed (e.g fast conversion, direct comparison, ...) but those are also very simple.

This then allows accepting the Python values with known size and passing them to functions (like region_begin) that currently take std::string but only pass them as c_str() down to Score-P and maybe compare them or take substrings.

If you agree with this design I'll create a PR with an implementation of this proposal

AndreasGocht commented 4 years ago

Sounds reasonable. Please go ahead.

Best,

Andreas

Flamefire commented 4 years ago

For some numbers: Replacing the std::strings by const char* where it is beneficial results in these numbers:

0.3920-0.4455, mean: 0.4115, median: 0.4073
0.9087-0.9889, mean: 0.9460, median: 0.9460
1.4548-1.5868, mean: 1.5129, median: 1.5108

0.3861-0.4372, mean: 0.4071, median: 0.4066
0.9111-1.0089, mean: 0.9551, median: 0.9547
1.4719-1.5940, mean: 1.5263, median: 1.5262

Top is with C-Strings for 100k, 300k and 500k iterations of the simplefunc benchmark (min, max, mean and median over 51 invocations) and bottom is current master using std::string. All using the cProfile instrumenter

So there is some effect although not much. The small case is even slower for some reason but I think this is an artifact from the small runtime.

I'd expect the effect to be more visible when memory pressure is higher especially as the benchmark uses only 1 function hence the memory allocator might return always the last used chunk which is faster than "real-world" use cases of random length functions

AndreasGocht commented 4 years ago

After merging #105 you might want to try again, and use s#. As the creation of region_names is only done, when necessary, this might save a lot of runtime.

score-p / scorep_binding_python

Avoid memory allocations #67

Technical remark:

42