ionelmc / python-hunter

Hunter is a flexible code tracing toolkit.
https://python-hunter.readthedocs.io/
BSD 2-Clause "Simplified" License
794 stars 46 forks source link

pickling support for hunter #74

Open incognitoRepo opened 4 years ago

incognitoRepo commented 4 years ago

pickling support for hunter.

problem statement:

starter question:

there are times, perhaps most times, when we want a quick reference to what happened during runtime. Whether error, exception, or "working on my machine" sometimes we just need a traceback, other times we need hunter for more in-depth analysis.

what i'm hoping would be a helpful addition to this is the ability to go into "fully interactive immersive" mode. where you can save the state of a program and reinstate that state at any time. This might seem like a debugger use case but that's only on the surface. This is really about having a state saved on disk that includes all the history of the program up to that exact moment (e.g., path-dependence). That a consumer can return to that exact state any time. In fully interactive mode.

my most immediate use case for this was when I was attempting to do extensive custom formatting in hunter.[0] its hard to find a way to e.g., format all the frame.f_locals in a pretty printed way in furthermore customized locations. having python objects serialized as such (or near facimile) one could go into an interactive interpretor (e.g., ipython) and play around with the formatting. they could also be easily be extended to "plugins/chisels" for html formatting and display (just need the saved state).[1] we would basically be offering the consumer all the objects (and their access methods: dot-notation, dict-key-notation, integer-index-notation, for-in, for-kv-in)[2] so the consumer could then make their decisions without attempting to pre-program for a certain data-structure that they might encounter, which might furthermore need to be re-parsed from str.[0]

throughout my experience attempting to write a tracer, I know that certain things do not 1) format and 2) serialize well. the challenge here would be to write an elegant "fallback batteries included" but cogent serializer.[3] It must deal with failures elegantly (i've been trying to fall back to repr with little success and im not sure why).[4] addinfourl, io objects, old optparse module whose __dict__ is self-referential(?). not to mention the easier frame/code objects.

i see you are interested in similar work (e.g., tblib). hoping i could use this as a learning opportunity while adding real value. i dont know c ( .__.)

[0] https://github.com/ionelmc/python-hunter/issues/38 this is serialization but not writing to disk and not python objects: "The only downside of this approach is that it effectively fixes the format of event since as_dict() becomes de facto serialisation code." serializating objects as native python objects (e.g., pickling) would in fact solve this problem by allowing the consumer to, at-time-of-coding, interactively choose the formatting.

hunter.trace(action=lambda event: json.dump({
  key, getatttr(event, key) for key in (
    'function', 'module', 'lineno', 'stdlib', 'arg', 'kind', 'filename', 'source',
    'threadname', 'threadid', 'depth', 'calls',
  )
}), fd=stream))

the only change needed here is to swap json.dump <-> pickle.dump(...)

[1] the serializer could be implemented as an action. or perhaps a chisel https://github.com/ionelmc/python-hunter/issues/36

[2] I have not found any native isinstance check that works for (and (not str), (list))

[3] jsonpickle claims to do this but I haven't had perfect success with it.

[4] even __repr__ has side effects(!). https://github.com/ionelmc/python-hunter/issues/52. as do ofc, exhausted generators.

ionelmc commented 4 years ago

Regarding your "fully interactive immersive" mode I'm afraid a python tracer is too high level to record the state in order to reverse at a later time. It would only work for toy projects that don't do crazy stuff in __repr__ or use any sort of lazy initialization. Kombu, Pytest and many others use techniques like that and all hell breaks loose when you try to look at the objects.

There are some projects that do this on a lower level (which is the right choice, you always need to go 1 layer under - it's like moving a house). See: https://morepypy.blogspot.com/2016/07/reverse-debugging-for-python.html

This why I didn't even consider serialization.

The closest I got is the detach API which takes a callback to convert objects. In theory safe serialization could be built on top of that - IOW pickling support is added in the Event class and only detached events allow pickling.

But would something that serializes mere string representations of objects actually help you?

incognitoRepo commented 4 years ago

thanks for the update. as you mentioned, a complete robust serialization of state seems to be a difficult thing in python (i'm not sure if this is a general computer science problem or specific to python as this is my first and main language).

i will more closely look into the detach API.

i did want to quickly point out a realization i had mulling over my end goals here. it goes back to formatting. it must be the case that there are "atomic access methods" into objects. e.g., for a list i might want to pretty-print it like:

'\n'.join([recursive_funk_that_formats_on_atomic_non_containers(elm) for elm in l]) 

but for a dict i might want

'\n'.join([f'{k}: {recursive_funk_that_formats_on_atomic_non_containers(v)}' for k,v in d.items()])

in short: yes, serializing string representations is fine. but perhaps using a base case (e.g., Mapping) for objects that have an items method, drilling down to list-style iterables (e.g., NonStrContainer), then falling back to just a repr or str. its really the NonStrContainer types that need formatting for readability. and this seems much more in line with the goals of hunter.

unless i've mistakenly missed that this ability already exists in hunter?

edit: also quite possible i just need to import a prettyprinter module and apply it to hunter. i'll have to check this

ionelmc commented 4 years ago

Hunter has a "so far deemed safe" routine for repr-ing objects in hunter.util.safe_repr. It had bugs with side-effects in the past so who knows. Currently it's extremely conservative and won't show contents of dict subclasses or things like that.