pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.56k stars 2.09k forks source link

Limit error message size when a function is invoked with the wrong type(s) #5060

Open bluescarni opened 6 months ago

bluescarni commented 6 months ago

Description

Recently I found myself in a situation where a Jupyter notebook would become unresponsive because of the huge error message generated by pybind11 when an exposed function is invoked with arguments of the wrong type(s). In this specific case, the wrong argument was a very long list of user-defined types, each of which has a string repr that requires hundreds of characters.

This PR is an attempt to limit the size of the error message that is attached to the pybind11 exception thrown when an exposed function is invoked with an argument of the wrong type.

Suggested changelog entry:

Limit the size of the error message when an exposed function is invoked with arguments of the wrong type.
rwgk commented 6 months ago

I like the general idea of reducing the output if it is unreasonably long, but I'm uncertain if simply cutting off the error is the best choice.

Could you please post the full error message you got somewhere? To see if that gives us ideas how we can best reduce the output.

henryiii commented 6 months ago

You could possibly lose the most important part of the error, and there's no way to recover it for debugging.

bluescarni commented 6 months ago

@rwgk I thought I had taken a screenshot of the occurrence but unfortunately I cannot find it any more (this was several weeks ago) :(

The original report came from a user of one of my libraries, who reported a jupyter notebook session rendered by VS code on a remote machine becoming unresponsive and eventually crashing down. When I re-ran the offending notebook on my local machine in a Jupyterlab session in the browser, it became clear that the user's code was calling one of the library functions with an invalid argument type and the error message generated by pybind11 would take several tens of seconds to be displayed in the jupyterlab session and would end totalling several tens of megabytes in size.

@henryiii @rwgk the intent of the modification is to crop only the part of the message containing the repr()s of the arguments, the header and footer of the message (which I am assuming is what you consider the important part of the message) are preserved. Unless I messed something up...

rwgk commented 6 months ago

the intent of the modification is to crop only the part of the message containing the repr()s of the arguments

I see, thanks.

Could you please add a test? That would make it easier for me to see what exactly is happening, and it'll be needed regardless for this change to get merged.

Without having seen examples: If a repr is very long, my first idea would be to cut out the middle, but keep the first N characters and last N characters, something like:

before:

this is a really very much badly ridiculously insanely overly too long repr

after:

this is a re... (51 characters omitted) ...oo long repr

This is because more often than not the important bits are either at the beginning or the end.

I think we could make N relatively small, e.g. 100.

Also, I'm wondering if it could make sense to reduce the repr only if PYBIND11_DETAILED_ERROR_MESSAGES is not defined.

bluescarni commented 6 months ago

Could you please post the full error message you got somewhere? To see if that gives us ideas how we can best reduce the output.

I managed to fish out a screenshot of the issue from a discord conversation with the user that originally reported the issue:

issue_sshot

I will come up with a test case in the next few days.

rwgk commented 6 months ago

I will come up with a test case in the next few days.

I didn't mean to create a lot of work with my question. I just thought it might take you only a minute or two to copy-paste something you readily available.

The screenshot makes things a lot more obvious (to me) already.

Looking closer now to verify, I see pybind11 does indeed use repr():

    msg += pybind11::repr(args_[ti]);

Bear with me, I'm supportive of making changes in pybind11,

But:

The real problem is that someone implemented a repr() that's obviously not the way it should be (IMO). The way I look at __repr__ vs __str__:

Caveat: I made that up. I don't know if that's in some official documentation somewhere.

And of course, I regularly see rushed or careless implementations that simply make __repr__ equivalent to __str__ (maybe I own a few of those myself).

Upshot:

I recommend you also work on fixing the root cause, i.e. fix the spammy repr().

But yes, we can also work on mitigating symptoms here.

The next step here is to add a test. I recommend not doing any other work before that test exists.

From looking around briefly, tests/test_kwargs_and_defaults.cpp,.py could be a good home for the new test.

FWIW: Often I find it more productive to create a new test/test_dev_wip.cpp,.py pair of files, to make it easier to work on something in isolation first. If you want to do that: copy-paste-strip-down an existing pair of .cpp,.py files and add one line to tests/CMakeLists.txt (very obvious).

bluescarni commented 6 months ago

The screenshot makes things a lot more obvious (to me) already.

Looking closer now to verify, I see pybind11 does indeed use repr():

    msg += pybind11::repr(args_[ti]);

Bear with me, I'm supportive of making changes in pybind11,

But:

The real problem is that someone implemented a repr() that's obviously not the way it should be (IMO).

That may be me :)

Before this issue was reported to me, I had already "fixed" my library's repr() so that it would truncate excessively long output. In fact, in the screenshot I sent, if you look long enough you will see ellipsis here and there - those are coming from the truncation of the repr() of my library's objects.

The problem to me seems to be that there is a very long list of objects being passed to that constructor, and the repr() of the list builtin does not do any truncation of the output. You can check that yourself in a Python terminal

In [1]: a = list(range(1000000))

In [2]: len(repr(a))
Out[2]: 7888890

In [3]: a = list(range(10000000))

In [4]: len(repr(a))
Out[4]: 88888890

In [5]:

So I am not sure this is really fixable on my side, given that even if I reduced the repr() size of my objects, a long enough list would still result in the same issue?

rwgk commented 6 months ago

So I am not sure this is really fixable on my side, given that even if I reduced the repr() size of my objects, a long enough list would still result in the same issue?

Ah, OK. I shall not argue with the behavior of list __repr__ :-)

So you have an easy pybind11 test to add right there!