volatilityfoundation / volatility3

Volatility 3.0 development
http://volatilityfoundation.org/
Other
2.72k stars 463 forks source link

Plugins: Yarascan + Vadyarascan Context #1287

Open dgmcdona opened 1 month ago

dgmcdona commented 1 month ago

There are limitations with the current implementation of the YaraScan and VadYaraScan plugins that seriously impacts their usefulness in the CLI; namely, the inability to view user-defined amounts of context surrounding yara matches in a hexdump format.

In the CLI, users must now enumerate yara hits with one of one the plugins, then copy information about the hit, such as the PID and offset, to another location, and re-read the data from the layer in which the match occurred within volshell, which is a laborious process. Additionally, neither of the plugins currently render the data using format_hints.HexBytes which would provide a more useful view of the matched bytes.

Within volshell, there is no publicly available API on either the YaraScan or VadYaraScan classes to enumerate hits and interact with those values programatically outside of constructing an instance of the plugin and retrieving values from the TreeGrid returned by the run method. In addition to the changes proposed here, we may want to consider providing classmethods for performing at least yara string searches without requiring consumers to manually update the configuration tree and construct the plugins via their constructors.

This attempts to address these issues by:

@ikelos This is resurrecting #1131, which I read your comment on. This at least partially addresses the issue of placing the new IntRequirements in a better location, although I'm not 100% sure this will meet your requirements if the idea is for these new requirements to specify the additional context before and after binary data returned from other places, and not just within these two plugins.

If you still feel that this is the wrong approach to this problem, and that more parts of the framework than just the yarascan and vadyarascan modules can make use of these requirements, I'm happy to work with you to identify the correct way to do this.

One thing that I am not yet certain about with this PR is the decision to extend the interface with a new enumerate_matches method in the VadYaraScan class and not in the YaraScan class; these two plugins don't return the exact same values in their TreeGrids (VadYaraScan includes a PID as well), so if an equivalent method was added to YaraScan, it likely wouldn't have the same signature, although there is no inheritance relationship between the two plugin classes here, so it may not be an issue.

I also noticed that YaraScan will scan multiple layers, but the returned offsets don't specify the layer in which the offset is valid. If we do move this logic out of the _generator() method and into a publicly exposed method, we will likely want to include the layer name in the returned tuple as well.

atcuno commented 1 month ago

@ikelos exposing the full hex+ascii dump contents of hits is pretty critical. Right now the results of the plugins are pretty bare and useless, and its been the main complaining point from students in workshops and is a fully valid complaint in my opinion.

ikelos commented 1 month ago

Yeah, I think you've found an issue that will be encountered by many of our plugins. I'm still debating the best of dealing with this though. I can't figure out if we should get the requested window size from the user/UI to the plugin (which then requires the user to rerun the plugin just to get a little more data) or if we invent a new return type that says "Data" and contains the layer and the offset to identify it? It would then be up to the UI to retrieve an amount of data and decide how to display it. That would also allow the UI to move forward and back at will (I'm thinking a web UI that "scrolls" you to the right part of the memory image) and in general be much more flexible than the CLI being handed a chunk of data. It shouldn't really affect things that consume the output of the plugin, because they should have the context they were working on and therefore access to all the layers.

I guess the only sticky issue would be if we ever wanted to support live layers, but I think that's not as terribly a useful feature as people may think. So what do you say? The steps we'd need to take would be:

So that sounds like about 3 pull requests, whaddya think?

ikelos commented 1 month ago

@ikelos exposing the full hex+ascii dump contents of hits is pretty critical. Right now the results of the plugins are pretty bare and useless, and its been the main complaining point from students in workshops and is a fully valid complaint in my opinion.

Thanks, I appreciate the feedback, but this whole paragraph feels unnecessary. It comes across as an attempt to apply pressure on me, and it puts my back up. I'm trying to give this a fair decision, but defensively trying to oversell it to me before you've even heard my response doesn't help its cause. "It's a fully valid complaint in [your] opinion" suggests that you think that in my opinion, it wouln't be a valid complaint, or that I'm immediately going to go against it. Have my other decisions been so unreasonable, or felt targetted against you, that you feel you need to put that in before I've said anything? I'd have much preferred "this is a problem a lot of students encounter", that doesn't denigrate what the plugins currently do, it doesn't push for a particular solution, it doesn't add pressure and it doesn't pit you and your students against any decision I might make that isn't the one being suggested, it just adds facts. I know you feel under pressure at the moment, but we're all on the same side, we're trying to make a good tool, and my job is to make sure it'll work long term, be maintainable and makes sense so we don't have to change it in six months.

atcuno commented 1 month ago

I wasn't trying to put pressure or you vs. me. I will word better in the future. I was trying to relay student response as we have been running some workshops (20-30 people) as well as having close friends/users go through some hands on exercises on their own to get feedback on features and use cases.

Putting "in my opinion" just meant I agreed with the students. Also, I typed that comment quickly as my flight was getting close to descending, and they would make us put our laptops away. I didn't mean it hostile, but it definitely reads that way. Sorry for that.

eve-mem commented 1 month ago

@dgmcdona maybe one for a different PR once all these bits have been worked out but it would probably make sense to also update the linux VmaYaraScan plugin too? Or does this do that already in a way I've missed?

dgmcdona commented 1 month ago

Thanks @eve-mem, I had totally overlooked that one. I'll make sure it's included in the effort when it's time to convert the plugins.

ikelos commented 2 weeks ago

Just to update, this is waiting on me to develop a better "data" output that will allow a UI to specify how much context should be provided to data blocks. #1310, which I hope to work on in a couple of weeks if not sooner.