Make Linux profiles aware of Architecture

GoogleCodeExporter commented 9 years ago

This patch adds an "arch" metadata to Linux profiles so that plugins and 
address spaces can know the architecture.

This patch also uses this added feature to fix an issue with address space 
checking to properly differentiate a 32-bit ARM image from a 32-bit x86 image.

Original issue reported on code.google.com by Joe.Sylve@gmail.com on 29 Jun 2012 at 2:13

Blocking: #283

Attachments:

ARMenhancement-archaware.patch.txt

GoogleCodeExporter commented 9 years ago

Hiya Joe,

Thanks for submitting this patch.  It seems as though it's effectively 
deprecating the memory_model setup for an arch one.  That may well be the best 
way to go, but if we're going to do that we should do it across the whole 
codebase, and we should do it before 2.2, since the memory_model is (I believe) 
new in 2.1.

So there are two options, and I'm going to ask for opinions on them:

a) Have the ARM architecture dig out the the appropriate value from the linux 
profile (bad, because it's making the AS have profile-specific code in it).
b) Convert all memory_model checks to arch checks.  The question is, will there 
ever be a single arch that can be both 32-bit and 64-bit at the same time?

So this needs a bit of thinking about before applying I'm afraid.  However, 
it's thinking we'll need to do soon if we don't want it to delay the 2.1 
release, so hopefully people will jump on this...

I'll try to find some time to mock-up a complete memory_model replacement patch.

Original comment by mike.auty@gmail.com on 1 Jul 2012 at 10:49

Added labels: Milestone-2.1.x

GoogleCodeExporter commented 9 years ago

Original comment by mike.auty@gmail.com on 1 Jul 2012 at 11:18

Now blocking: #283

GoogleCodeExporter commented 9 years ago

I'm not sure if the best route is to depreciate memory_model checks all 
together.  There may certainly be architectures that support multiple memory 
models.  They did announce last year that there would be 64-bit ARM support in 
v8 (although that might be considered a different architecture all together). 

http://www.eetimes.com/electronics-news/4230160/ARM-unveils-64-bit-architecture

But what about ARM systems that are big-endian and that are little-endian?  
We'll eventually run into that problem, once people want to use the code for 
more than just on Android devices.

Maintaining the memory model checks also would help abstract plugins that just 
need to know a specific memory model (32-bit or 64-bit, big-endian, 
little-endian) but don't really care about the architecture.

Something to think about.

Original comment by Joe.Sylve@gmail.com on 1 Jul 2012 at 1:37

GoogleCodeExporter commented 9 years ago

Another thing to think about with this is arch-specific overlays. Right now we 
are overlaying the x86 cpuinfo struct, but obviously that does not exist on ARM 
so you get a warning about the struct missing from the vtype each time you run 
a plugin against Android.

Original comment by atc...@gmail.com on 1 Jul 2012 at 6:58

GoogleCodeExporter commented 9 years ago

Hmm, I'm aware of several mixed archs scenarios: 

1) The one Joe mentioned about ARM
2) Windows WOW64 (32 bit process AS on x64 kernel)
3) The OS X issue attrc mentioned on the [[MacMemoryForensics]] wiki about x64 
processes on an x86 kernel

Original comment by michael.hale@gmail.com on 2 Jul 2012 at 3:30

GoogleCodeExporter commented 9 years ago

So I guess I'm a bit confused about the problem with this patch.  I don't think 
it's wise to depreciate memmodel with arch.  It seems that in some places it 
would be best to check against arch and in others memmodel (for instance x86 
native types are the same as 32 bit ARM native types, etc).  What do we need to 
do to get this done properly?

Original comment by Joe.Sylve@gmail.com on 3 Jul 2012 at 2:53

GoogleCodeExporter commented 9 years ago

Sorry, we're all quite busy, but this needs thinking through before it's done, 
because it appears to be mixing some elements together that shouldn't be mixed.

The Profile takes care of how the data's laid out in memory.  The Address Space 
takes care of how to read memory to start pulling data out of it.  Why does the 
profile (how you read data) need to be aware of which architecture/address 
space it works on?

So I guess I'm asking how would the Debian-for-Arm-32bit be different to the 
Debian-for-Intel-32bit?  Why should a profile care whether it's Arm or Intel?  
Why can't the address space figure out whether it works or not, rather than 
checking to see which profile it's running as to decide if it might be valid?  
The 32bit/64bit is for plugins to differentiate which one they're working on, 
but at the moment, the sole use for the arch metadata appears to be to hide the 
ARM AS from normal profiles because the code to check whether we're dealing 
with an ARM space or not just doesn't work.  If so, that's the bit that needs 
fixing, if not then I think I could do with a better explanation of why.

As to the cpuinfo plugin, I'd hope that it's possible to rewrite that to use 
sysmap, rather than relying on the profile to tell it whether it exists or not. 
 That information must surely already be in there?  It should certainly be 
possible to write a ProfileModification that doesn't even attempt to overlay 
the cpuinfo if it isn't already present.

I don't want to rush into adding stuff to the metadata set because it's an easy 
hack, I want it to be added after we've thought it over because it's necessary 
and it's the right thing to do.

Original comment by mike.auty@gmail.com on 3 Jul 2012 at 6:22

GoogleCodeExporter commented 9 years ago

Mike,
   It looks like this patch modifies the is_valid_profile() test in the address space to check what the profile is like. This is perfectly ok - an address space does need to check whether its compatible with the current profile (some address spaces need to use the profile to parse their internal data structures - e.g. crash, hiber etc - these do very similar tests to the profile albeit in their constructor instead of the is_valid_profile method).

Indeed I agree with you that the profile does not need to know about the 
address space, but the opposite is happening here.

IMHO we need to have both pieces of metadata, since they are both important 
(arch and memory model). The page translation mechanism needs to know which 
hardware it is running on, information it can only get from the profile.

This only affects the address space voting mechanism to ensure we dont consider 
an address space which is not compatible with the current profile.

Original comment by scude...@gmail.com on 3 Jul 2012 at 11:55

GoogleCodeExporter commented 9 years ago

So what happens with multi-arch profiles?  It seems as though the actual data 
structures should be identical between Arm and Intel for the linux kernel on 
the same memory_model.  Having two profiles hanging around just so one can say 
Arm and one can say Intel seems like a waste.

Previously the page translation mechanism didn't need to know which hardware it 
was running on, it ran its own tests (based on the memory_model) to determine 
whether its page translation mechanism could be in use (for example, 
Pae/NonPae).  Now we seem to just not even be bothering doing proper AS 
checking, but just deciding whether the user asked for it or not.  I think the 
right way of solving this is for the ArmValidAS to be improved to properly 
differentiate between possible ASes, and they should then be ordered by 
likelihood.  Arm should not be the first AS we try, since it's very, very 
unlikely to be the right one most of the time.  This also means we need to make 
sure our intel checks fail for most Arm systems if possible.

However, if this patch wants to go in, which it seems like everybody else is 
suggesting it should, then at a minimum it needs to think about more than just 
Arm/linux.  Every address space should have an appropriate arch added, and 
every profile that doesn't work with Arm needs to say so.  This is not just a 
minor change in isolation, metadata changes affect every profile and given 
their use, every AS as well, hence the care that should be taken in planning 
them.

@Joe, could you please mock up a new patch that takes into consideration the 
whole codebase as opposed to just this one part, and then we can review it 
again?  This would all be much easier if there were a proper check in 
ArmValidAS, but I'll commit it without given so many people think it's 
necessary.

@attc, the ARM space has had far less testing, and I'm concerned it will push 
the 2.2 release past OMFW if we try to merge it into trunk in the next 
iteration.  I'd like to get your opinion on branching from r1902, just before 
the Arm commit?

Original comment by mike.auty@gmail.com on 4 Jul 2012 at 7:33

GoogleCodeExporter commented 9 years ago

I am not sure what you mean with multi-arch profile. A profile contains all you 
need to parse data _on a specific arch_. Indeed in this patch there is a 
difference between the architectures which the profile encapsulates (through 
the system symbols).

The page translation mechanism must know what hardware it is running on (its 
emulating hardware after all). Address spaces already perform tests which _are_ 
based on the profile. For example:
http://code.google.com/p/volatility/source/browse/branches/linux-trunk/volatilit
y/plugins/addrspaces/hibernate.py#81

The address space voting mechanism is only a convenience - whatever checking is 
done can easily be bypassed, so we need to allow the user to specify it (that 
is another problem though :-)

The whole point of the metadata concept is that it does not have to be present. 
If other profiles do not export the arch method it can be defaulted or ignored 
or whatever. If this specific address space is looking for a specific 
architecture, this does not affect anyone else. I dont think this particular 
patch should be modifying other unrelated address spaces, instead it should 
have its order specified before all the other ones and specifically select an 
arch metadata (which may not exist on other profiles). By doing this it should 
win the voting round before the others. In that I disagree with your statement 
that every profile which does not want to work with ARM should do something - 
this patch should only affect the subsystems concerned with it without 
modifying the entire code base.

Regarding you comment on testing. I agree whole heatedly. Are there any ARM 
images we can test with that can be published?

Original comment by scude...@gmail.com on 4 Jul 2012 at 8:32

GoogleCodeExporter commented 9 years ago

What I meant by a multi-arch profile is a profile that is relevant to multiple 
architectures.  There is no inherent reason why a structure made up of four 
integers should be any different on Arm than on Intel x86 (assuming the 
endian-ness is not an issue).  I disagree with you on what a profile holds, to 
me it is a collection of structures necessary to parse data, and is 
architecture independent.  I would not expect to have to choose a different 
profile if I was using PAE rather than NonPAE.  I had rather hoped that 64-bit 
systems would be able to share a profile with only a change of native_types, 
but unfortunately the vtype system encodes specific relative offsets for each 
item in a structure, making that impossible.  I would still expect an amd64 AS 
to be checked if the profile were 32bit, and succeed if it were indeed an amd64 
space.

The test you pointed out for hibernation files is to stop the rest of the 
plugin blowing up because a required structure isn't present.  That structure 
*could* be added to every profile if necessary, it would not prohibit the 
address space running as the arch metadata would, because each profile could 
only ever specify a single architecture!  The example wasn't a metadata check, 
it was a profile contents check.  If the ARM space relied on certain structures 
being present, I would expect it to check the profile for them.

I agree, the point of the metadata check is that it does not have to be 
present.  Unfortunately, I've seen branches with sloppy coding practices where 
people attempt to directly address the variables, rather than using the 
dictionary interface, so I'd prefer to provide defaults to save those coders 
from themselves.  Either way, it will fail open, so anywhere that a check 
*doesn't* exist will succeed.  When adding in such a fundamental property as 
"arch", I'm not sure failing open is the behaviour we want. 

Also, with the patch as it stands there is very little stopping an ARM profile 
instantiating against an intel space save for the fact that ARM is ranked 
higher than it.  What happens when we add another space on top of that?  We 
can't keep adding metadata variables to help us choose which AS is used, the 
ASes should take care of that as much as possible on their own.

If a profile were only good for one type of address space, then profiles should 
have been built that way, but it's not the case (pae), so why put that limit on 
ARM profiles specifically?

Original comment by mike.auty@gmail.com on 4 Jul 2012 at 12:41

GoogleCodeExporter commented 9 years ago

Mike, I fully agree with you that a profile should not generally care about the 
address space it is instantiated on. This is not the actual problem right here. 
The problem is that the address space requires the profile to have certain 
symbols before it can be instantiated. i.e. the address space depends on the 
profile.

This patch achieves this dependency indirectly via setting an arch metadata on 
the profile (in order to abstract the actual check for the ARM arch), but 
ultimately the ARM AS is declaring that it will only work if the profile is 
suitable (i.e. contains the right symbols). In this way its the same as the 
hibernation AS.

The rank problem you pointed to is inherent in every guessing system. 
Essentially the problem of selecting address space is not related to the 
problem of selecting the profile. Currently the user selects the profile, and 
we guess the AS. We might get that guessing wrong. If the user could specify 
the AS as well there will be no problem. Commonly the choice of the AS (in this 
case ARM) is really synonymous with the choice of the profile which means 
making the AS depend on the profile in this case is reasonable.

If the system erroneously instantiates an Intel AS for an ARM profile it means 
the voting mechanism needs to be tuned - this will happen from time to time as 
with every system that attempts to guess something.

I don't think the metadata system fails open - if the ARM AS checks like:

if profile.metadata.get("arch") == "arm":

this will not pass if arch is not defined, and does not require every profile 
to define this metadata.

Original comment by scude...@gmail.com on 4 Jul 2012 at 1:05

GoogleCodeExporter commented 9 years ago

Great!  Since we're in agreement, why don't we change the code to check the 
profile for the symbols it actually requires which we both agree needs to 
happen, rather than adding a new arch metadata variable which we don't agree 
on, and that way everyone will be happy?  5:)

Original comment by mike.auty@gmail.com on 4 Jul 2012 at 1:24

GoogleCodeExporter commented 9 years ago

One comment...

"It seems as though the actual data structures should be identical between Arm 
and Intel for the linux kernel on the same memory_model.  Having two profiles 
hanging around just so one can say Arm and one can say Intel seems like a 
waste."

This isn't true. There are many arch-specific structures that are used. One of 
them is the cpuinfo struct, which is x86_cpuinfo on Intel and something 
completely different on ARM. It gets even worse when you try to recover 
architecture specific features.

Original comment by atc...@gmail.com on 6 Jul 2012 at 6:30

GoogleCodeExporter commented 9 years ago

Original comment by mike.auty@gmail.com on 29 Jul 2012 at 9:52

Removed labels: Milestone-2.1.x

GoogleCodeExporter commented 9 years ago

Okay, let's try this again.

This patch does add arch metadata to AbstractLinuxProfile, but it also contains 
examples of why this is a good thing and is useful.

This allows arch specific overlays. For example, cpuinfo_x86 is overlayed, but 
this structure obviously only exists on intel processors, so you get an error 
every time you run a plugin on ARM.  This fixes that.

This also fixes the naming of profiles.  Android ARM profiles will now be 
correctly named LinuxWhateverARM instead of LinuxWhateverx86.

I removed the is_valid_profile stuff for now, because that seemed a bit too 
controversial, but I would like to revisit that discussion, because it does fix 
the problems with voting between x86 and ARM, but I understand that voting is 
going to be changing, so I'll wait and see.

Original comment by Joe.Sylve@gmail.com on 25 Aug 2012 at 7:41

Attachments:

archmetadata.diff

GoogleCodeExporter commented 9 years ago

This issue was closed by revision r3359.

Original comment by labaru...@gmail.com on 9 Apr 2013 at 5:49

Changed state: Fixed

jack198345 / volatility

Make Linux profiles aware of Architecture #281