Closed sahu-sunil closed 9 months ago
Interesting. That's quite a large upgrade though, a ton of things have changed. Are you using cattrs.Converter
, cattrs.BaseConverter
or functions imported directly from cattrs (like cattrs.structure
)?
You can likely resolve your issue by switching to cattrs.BaseConverter
, that converter class does minimal code generation. It's somewhat slower to structure and unstructure though. Would be curious for more information!
I am using cattrs.Converter
and then registering hooks like -
converter = Converter(detailed_validation=False)
converter.register_structure_hook(
ClassName,
make_dict_structure_fn(
ClassName,
converter,
_cattrs_use_linecache=False,
_cattrs_detailed_validation=False,
),
)
converter.structure(....)
Will try BaseConverter
for less initialisation but AFAIK it will still go for functions mentioned in above profiler snapshot.
BaseConverter won't help if you're using make_dict_structure_fn
directly, since that's the source of the issue.
Well, the root issue is that you're probably recreating the same class a lot in your tests.
Let me give you some context. make_dict_structure_fn
will compile a specialized function for structuring. It'll also store the source code of that function using Python's linecache module; this is necessary for good stack traces and for stepping through the function in a debugger.
But if you, for example, have a test that recreates the same class (or a converter) over and over again, you'll effectively create a leak there since all this source code will keep getting stored there, and generate_unique_filename
is inefficient when dealing with a large number of classes with the same name (which is what you're seeing in the profile dump).
To me this indicates an inefficiency in your tests, any chance you could not generate so many converters or classes and hooks? Even if I worked around this in cattrs, it's still kind of bad practice.
That said, use_linecache=False
should fix the issue so maybe it's a different class than what you think (or a nested class). You can debug by maybe setting up a breakpoint and inspecting linecache.cache
manually; it'll probably have the same class name a ton.
import linecache
print(linecache.cache)
Yes, I tried to avoid recreation of classes/converters at some places and it helped a bit but still there are places where I can't get rid of them.
I was going through make_dict_structure_fn
function and wanted to clarify -
Should this be inside if condition ?
#fname = generate_unique_filename(cl, "structure", reserve=_cattrs_use_linecache)
script = "\n".join(total_lines)
eval(compile(script, "", "exec"), globs)
if _cattrs_use_linecache:
fname = generate_unique_filename(cl, "structure", reserve=_cattrs_use_linecache)
linecache.cache[fname] = len(script), None, total_lines, fname
Do we need this in case of linecache
disabled?
Hm, you might be right. We can put that into the if
block. If you want to put together a quick PR and the tests pass I'll merge that in. (Make the PR against the v23.2 branch so we can release it quickly.)
It might not help though. If you have classes that you can't control they'll enter that if
block regardless.
You're just recompiling the exact same handlers over and over again, right? So the source code will be identical every time. I'm thinking there could be an optimization opportunity here where we could look for the script
matching and just reuse the same linecache entry for all handlers. Then its size would be constant instead of O(n).
Okay. In middle of something, won't be able to raise it today. Please see if you can else I will take a look later. Thanks
Try this branch: https://github.com/python-attrs/cattrs/tree/tin/linecache-opt
Yes It is working now. Thank you for quick responses & optimisations.
Cool, will release this soon.
We are using cattrs for config validation purpose of our tests in Asynchronous framework. Thousands of tests run every minute. We upgraded from 1.5.0 to 23.1.2 which has major changes on how cattrs back-end works. The cpu touches 100% in couple of mins. After profiling we found that cache (linecache) was the reason for high cpu usage. We disabled the caching (including detailed validations) with
_cattrs_use_linecache=False
which helped a bit, now it takes couple of hours (3-4h) to reach cpu to 100. We seegenerate_unique_filename
is still taking lot of time.Is there any other way to handle this in cattrs latest version?