Open AndrewFasano opened 1 year ago
Naming functions is difficult. People rarely agree. Maybe we need a style guide and then make a pass over everything to rename.
I prefer naming methods such that you produce, essentially, code that can be read aloud. I also don't love abbreviations since they are often employed haphazardly.
panda.run_serial_cmd() -- that's kinda works: "Panda, run this serial command" -- but do we need to know its via serial port? We are revealing an internal detail for no particular reason.
panda.run_cmd() -- better. But I'd like to add "guest" in there to indicate that this is happening in the guest. I also don't love abbreviations like "cmd"
parnda.run_guest_command() -- that would be my druthers for this. "Panda, run this guest command, please"
As far as "Auto-instantiated subclasses" I think maybe you don't mean auto anything? Just refactoring. Unless I missed the auto part.
I like this idea. I like arch. We could have guest
which would contain things that happen in the guest like ... run_command. Ok so that means the above would be panda.guest.run_command()
which appeals to me.
The existing documentation was very helpful in getting me up to speed with PyPANDA. It's super easy to look at the docs and see how panda callbacks such as before_block_exec work. All of the examples and tests are also great.
I ran into trouble in a couple of areas.
1) For my first experiment I wanted to see if I could call a function in the stringsearch plugin and register a callback in stringsearch. I struggled with this for a bit, until I ventured outside the generated documentation and looked at some of the examples. Once I realized how to work with the panda.plugins array and panda.ppp decorator I was able to make some progress. I'm not sure how someone could figure out what plugin functions or callbacks were available without looking at the C code. And without the examples I'm not sure I ever would have figured out how to work with a C plugin. If documentation on C plugin methods and callbacks could be generated with stubbed out examples I think that would be really helpful for someone starting with PyPanda - especially if they weren't familiar with the C plugins.
2) Maybe I missed it, but somewhere protobuf==3.13.0 should be listed as a required dependency. Without this all sorts of problems crop up and it isn't super obvious what the problem is. Maybe the version number is something unique to my environment and would be different elsewhere?
@AndrewFasano nice issue.
I think this is worth doing. I like a lot of the subsystems you have identified. I think we could even do something clever like load them on demand.
I totally agree that either dropping or adding warnings to the qemu section would be good.
If we are going to be making major changes I think:
panda.virtual_memory_read
to print that than then call panda.mem.virt_read
or something.In particular, I think we should do something different about CPUState
. Currently it's the first argument to every callback and an essential value to pass for many/most functions. I think one of two paths could be good:
Get rid of CPUstate everywhere. This means filtering it from the new callback type and any function that needs it can call panda.get_cpu
.
The API here would move from panda.arch.get_arg(cpu, 0)
to panda.arch.get_arg(0)
.
Alternatively, we wrap the cpustate into a Python object that can be interacted with. Rust does this quite well:
let data = cpu.mem_read(buf, len as usize);
but I'd consider also having both mem
and arch
fall into the CPUState while also maintaining access to the internal CPUState.
@MarkMankins this is a helpful note. We have had some discussion re:making C internals clearer to Python documentation, but we haven't yet settled on a way to unify that documentation. Ideally, we'd be able to automatically generate it from C/C++/Rust code like we do with Python documentation, but we haven't yet identified a way forward with that.
I'd definitely want to maintain backwards compatibility for a little bit (as an aside, I have no idea how to estimate how many people would be impacted by pypanda api-breaking changes pushed to dev but I suspect it's just us and now Mark). I think the process for this might look like: 1) Pick a standard / style guide for the API function names 2) Identify the categories of functions we'd want to group together and come up with their names 3) Identify functions we want to drop from the API (if any) 4) Actually make the code changes
As for CPU state - I definitely agree that it's kind of bad as is. I see two things worth thinking about.
1) If we stop passing CPUState around and switch to first_cpu
(probably implicitly), we're limiting ourselves to single-core guests - if we move to TCG plugins and qemu 7 we'll need to add the CPU pointers back. If we're planning to eventually get there, maybe it's worth leaving this as is for now.
2) If we're going to make this change, I'd suggest we do it in the C/C++ APIs (and then in the python/rust interfaces to the APIs) instead of just doing it in python.
For unifying docs, did @tleek's branch with kerneldoc-style comments and the infrastructure to generate docs from that get merged? Also I think it would be worth explicitly explaining how pyplugins can interact with C/C++ plugins in the pypanda docs.
Also, regarding Mark's comment
Maybe I missed it, but somewhere protobuf==3.13.0 should be listed as a required dependency
I'm not sure if that's actually the case and/or how to improve things. I've noticed that the python protobuf package needs to match the system version and on my Ubuntu 22.04 machine, the latest version I can get from pip is way ahead of the latest version from apt.
After 0d98fa49b723a1f623b18de1bbd27d025a208200 (from #1327) protobuf hopefully shouldn't be required to use PyPANDA, unless you're actually trying to do things with protobuf files. You can also set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
in your environment to fix issues with apt/python versions of the package being different.
Overall I'm still fairly confused by what's going on with protobuf versions - not sure if we could/should mandate a specific version as a dependency or if there's something we can do that's better than the current state of things.
If we stop passing CPUState around and switch to first_cpu (probably implicitly), we're limiting ourselves to single-core guests - if we move to TCG plugins and qemu 7 we'll need to add the CPU pointers back. If we're planning to eventually get there, maybe it's worth leaving this as is for now.
I don't think this would necessarily be incompatible with a multi-cpu system. panda_get_cpu
doesn't have to return first_cpu
. It could return the currently active CPU whichever that might be for the current thread.
Having said that I am starting to come around to option #2 that I talked about above. Where the CPU is a Python object that has methods you can act upon. As opposed to panda.virtual_memory_read
you'd get a CPUState and do cpu.virtual_memory_read
.
Oh I think I like that model too!
Maybe it's just the tests that require protobuf. Without protobuf some of the tests fail...
Traceback (most recent call last):
File "file_hook.py", line 5, in
It would be nice if taint2 would allow one to label a virtual memory address instead of a "RamOffset".
More of a panda problem than a PyPanda problem, but this detail makes things harder in PyPanda than they need to be.
Edit: For reasons I don't understand PyPanda passes physical addresses to taint2 apis and the C plugins pass "ram offsets". PyPanda can't duplicate what the C plugins do because the function to convert physical addresses to ram offsets isn't exposed to PyPanda. But it seems like this works?
@MarkMankins which function does this conversion? It would be fairly easy to add.
PandaPhysicalAddressToRamOffset().
It's defined in common.h: https://github.com/panda-re/panda/blob/dev/panda/include/panda/common.h.
See tstringsearch.cpp for an example of how it is used.
Looking to start a conversation about making some changes to the PyPANDA APIs and docs to improve usability. These are all subjective design choices so I'm opening an issue for us to discuss instead of starting with a PR.
New function names
Rename some functions to shorter/easier-to-remember names. We could keep the original names for a bit if we made these changes.
run_serial_cmd
->run_cmd
revert_sync
->revert
queue_blocking
-> something shorter/more meaningful?drive_guest
perhaps?Anyone feel strongly about these or have other functions you'd like to see renamed?
Auto-instantiated subclasses
I've also been thinking if we could split up some of our logic into classes the way we've done with
panda.arch
(with docs here) and then use that as a way to group components of our documentation together. I think this approach makes it easier to search through the documentation because you can first select the thing you know you're interested in (e.g., OSI) and then look through a smaller list of functions. Right now the ctrl-f experience on the main docs page is pretty rough since it also searches all the source code and almost every pypanda function is listed there.First question - is this worth doing? I think it would be a pretty quick refactor. I'm in favor of it, but curious what others think. If so, here are some ideas of classes we could create. After starting this list I recall we left some big comments splitting panda.py up by groups of functions so we could probably draw some inspiration from those if we wanted to go this route.
taint
run_serial_cmd
object_*
, othersNotably I would want us to keep some of the most commonly used functions in the main pandare namespace, though perhaps these could be references to subclass-specific implementations. For example:
run_serial_cmd
,revert_sync
,run
, andqueue_blocking
.Other things
@cb_*
functions documented before the main PANDA functions - I think it used to be the other way around, and was probably better before?mem_hooks
functions are undocumented@MarkMankins - you're a new PyPANDA user so I'd love to hear your thoughts on if these changes would or wouldn't have helped you get up to speed using it. I'll also tag @lacraig2 and @tleek since they're co-designers of the original system and might have more ideas.