williamcroberts commented 6 years ago

remove the policy session building support.

The RM had a bug that was flushing context when contexts need to be kept alive when a tool exited. This forced us to add policy building to tools like tpm2_unseal so the policy session was not flushed before the unseal command.

We can remove this.

martinezjavier commented 6 years ago

@williamcroberts one thing to mention is that before removing this support, we have to make sure that it also works when using the resource manager in the kernel.

One question, how this will work in practice? We will still have to use tpm2_createpolicy to build the policy session and also add a new option to tpm2_unseal to get the policy session handle, right?

So something like the following:

$ tpm2_pcrlist -L ${alg_pcr_policy}:${pcr_ids} -o $file_pcr_value
$ handle=$(tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -F $file_pcr_value -f $file_policy -e | cut -d ':' -f2)
$ tpm2_load -c $file_primary_key_ctx  -u $file_unseal_key_pub  -r $file_unseal_key_priv -n $file_unseal_key_name -C $file_unseal_key_ctx
$ tpm2_unseal -c $file_unseal_key_ctx -E $handle

williamcroberts commented 6 years ago

-S or --in-session-handle should take the handle from tpm2_createpolicy.

martinezjavier commented 6 years ago

@williamcroberts Ok, I tried this change int the tpm2_unseal test:

diff --git a/test/system/test_tpm2_unseal.sh b/test/system/test_tpm2_unseal.sh
index b28dc4e07ed3..20b20f5b293b 100755
--- a/test/system/test_tpm2_unseal.sh
+++ b/test/system/test_tpm2_unseal.sh
@@ -100,7 +100,7 @@ if [ $? != 0 ];then
     exit 1
 fi

-tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -F $file_pcr_value -f $file_policy
+handle=$(tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -F $file_pcr_value -f $file_policy -e | cut -d ':' -f2)
 if [ $? != 0 ];then
     echo "create policy fail, please check the environment or parameters!"
     exit 1
@@ -118,7 +118,7 @@ if [ $? != 0 ];then
     exit 1
 fi

-unsealed=`tpm2_unseal -c $file_unseal_key_ctx -L ${alg_pcr_policy}:${pcr_ids} -F $file_pcr_value`
+unsealed=`tpm2_unseal -c $file_unseal_key_ctx -S $handle`
 if [ $? != 0 ];then
     echo "unseal fail, please check the environment or parameters!"
     exit 1

And this is the what I got:

$ ./test_tpm2_unseal.sh 

CreatePrimary Succeed ! Handle: 0x80ffffff

ObjectAttribute: 0x00000052

Create Object Succeed !

Load succ.
LoadedHandle: 0x80fffffe

Bank/Algorithm: sha(0x0004)
PCR_00:c72ec9e6cbc2b6a95f334dddd6513981da00f0c2
PCR_01:1ee7d01fcf5327141db8ee5bf8543b17c73fc692
PCR_02:b339f9ad59a0e34cb6dc0dc835387f49726b4691
PCR_03:b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236
ObjectAttribute: 0x00000492

Create Object Succeed !

Load succ.
LoadedHandle: 0x80fffffe

ERROR: Sys_Unseal failed. Error Code: 0x918
ERROR: Unseal failed!
unseal fail, please check the environment or parameters!

$ tpm2_rc_decode 0x918
error layer
  hex: 0x0
  identifier: TSS2_TPM_ERROR_LEVEL
  description: Error produced by the TPM
format 0 warning code
  hex: 0x18
  name: TPM_RC_REFERENCE_S0
  description: the 1st authorization session handle references a session that is not loaded

So as mentioned this is not only a problem with the user-space resource manager but also with the kernel-space one. Fixing the tabrmd is not enough to get rid of these options.

Or did I misunderstand?

williamcroberts commented 6 years ago

Where did you set the TCTI to use /dev/tpmrm0?

I need to talk to Jarrko to find out if the bug exists in the in kernel RM, if it does, we'll try to get it rectified.

williamcroberts commented 6 years ago

@flihp Did you ever talk to Jarrko about above? ^^^^

martinezjavier commented 6 years ago

@williamcroberts I export the env vars in the shell before running the test:

$ export TPM2TOOLS_TCTI_NAME=device
$ export TPM2TOOLS_DEVICE_FILE=/dev/tpmrm0
$ ./test_tpm2_unseal.sh

williamcroberts commented 6 years ago

You verified with strace that it's working as expected?

With that said, if this bug does exist in the Kernel as well, it needs to be fixed there too.

martinezjavier commented 6 years ago

I do see that the /dev/tpmrm0 character device is opened, that the loaded object context file (created by tpm2_load) is read, and that a command (which should be TPM2_Unseal ) is sent to the TPM and a response read.

I didn't decode the command and response to compare if the values are correct, but I don't see why would I if everything else works correctly (so I don't think is a data marshalling issue).

openat(AT_FDCWD, "/dev/tpmrm0", O_RDWR) = 3             
openat(AT_FDCWD, "ctx_load_out_sha256_ecc-sha256_keyedhash", O_RDONLY) = 4                                       
fstat(4, {st_mode=S_IFREG|0664, st_size=936, ...}) = 0  
read(4, "\272\334\300\336\0\0\0\1@\0\0\v\200\0\0\0\0\0\0\0\0\0\7\n\3\216\0 \216b\25\312"..., 4096) = 936         
write(3, "\200\1\0\0\3\252\0\0\1a\0\0\0\0\0\0\7\n\200\0\0\0@\0\0\v\3\216\0 \216b"..., 938) = 938                 
read(3, "\200\1\0\0\0\16\0\0\0\0\200\377\377\377", 4096) = 14                                                    
close(4)                                = 0             
write(3, "\200\2\0\0\0\33\0\0\1^\200\377\377\377\0\0\0\t\3\0\0\0\0\0\0\0\0", 27) = 27                            
read(3, "\200\1\0\0\0\n\0\0\t\30", 4096) = 10           
write(2, "ERROR: ", 7ERROR: )                  = 7      
write(2, "Sys_Unseal failed. Error Code: 0"..., 36Sys_Unseal failed. Error Code: 0x918) = 36                     
write(2, "\n", 1                                        
)                       = 1                             
write(2, "ERROR: ", 7ERROR: )                  = 7      
write(2, "Unseal failed!", 14Unseal failed!)          = 14                                                       
write(2, "\n", 1                                        
)                       = 1                             
close(3)                                = 0             
exit_group(1)                           = ?             
+++ exited with 1 +++                                   
unseal fail, please check the environment or parameters!

@williamcroberts do you have a reference to the tabrmd fix so I can take a look to the kernel code and try to understand what's missing? I looked at tpm2-abrmd repo but didn't find anything.

williamcroberts commented 6 years ago

@martinezjavier it's in @flihp 's queue and not complete. @flihp and I discussed this today, and we heard reports that context saving worked with the in-kernel RM. I would bring it up on the tpmdd-devel mailing list. However, I would make sure that the above sequence should work... perhaps @idesai can discuss in more detail (since he did a lot of this work).

The tpm device driver mailing list can be found here: https://sourceforge.net/p/tpmdd/mailman/tpmdd-devel/

martinezjavier commented 6 years ago

@williamcroberts yes, I'm subscribed (and actually following) the tpmdd-devel list.

As you said, I mostly wanted a confirmation that what I testing is the correct sequence and that the problem is really in the kernel RM. So I'll wait for @idesai input on this.

idesai commented 6 years ago

@martinezjavier can you try this sequence?

tpm2_pcrlist -L ${alg_pcr_policy}:${pcr_ids} -o $file_pcr_value
tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -F $file_pcr_value -f $file_policy
tpm2_load -c $file_primary_key_ctx -u $file_unseal_key_pub -r $file_unseal_key_priv -n file_unseal_key_name -C $file_unseal_key_ctx
handle=$(tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -e -a| grep EXTENDED |cut -d ':' -f2)
tpm2_unseal -c $file_unseal_key_ctx -E $handle

martinezjavier commented 6 years ago

@idesai I guess you meant -S and not -E in your last command. That's basically what I tested...

But just for completeness:

#!/bin/bash

alg_primary_obj=sha256
alg_primary_key=ecc
alg_create_obj=sha256
alg_create_key=keyedhash
alg_pcr_policy=sha1

pcr_ids="7"

file_pcr_value=pcr.bin
file_policy=policy.data
file_primary_key_ctx=primary.context
file_unseal_key_pub=obj.pub
file_unseal_key_priv=obj.priv
file_unseal_key_ctx=load.context
file_unseal_key_name=load.name
file_unseal_output_data=unseal.output

secret="12345678"

tpm2_takeownership -c
tpm2_createprimary -A e -g $alg_primary_obj -G $alg_primary_key -C $file_primary_key_ctx
tpm2_pcrlist -L ${alg_pcr_policy}:${pcr_ids} -o $file_pcr_value
tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -F $file_pcr_value -f $file_policy
tpm2_create -g $alg_create_obj -G $alg_create_key -u $file_unseal_key_pub -r $file_unseal_key_priv -I- -c $file_primary_key_ctx -L $file_policy -E <<< $secret
tpm2_load -c $file_primary_key_ctx -u $file_unseal_key_pub -r $file_unseal_key_priv -n file_unseal_key_name -C $file_unseal_key_ctx
handle=$(tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -e -a| grep EXTENDED |cut -d ':' -f2)
tpm2_unseal -c $file_unseal_key_ctx -S $handle

$ export TPM2TOOLS_TCTI_NAME=device

$ export TPM2TOOLS_DEVICE_FILE=/dev/tpmrm0

$ ./test.sh 

CreatePrimary Succeed ! Handle: 0x80ffffff

Bank/Algorithm: sha(0x0004)
PCR_07:6d7206871c9c6f38ad3997baceebee95dadec04d
ObjectAttribute: 0x00000012

Create Object Succeed !

Load succ.
LoadedHandle: 0x80fffffe

ERROR: Sys_Unseal failed. Error Code: 0x918
ERROR: Unseal failed!

$ tpm2_rc_decode 0x918
error layer
  hex: 0x0
  identifier: TSS2_TPM_ERROR_LEVEL
  description: Error produced by the TPM
format 0 warning code
  hex: 0x18
  name: TPM_RC_REFERENCE_S0
  description: the 1st authorization session handle references a session that is not loaded

idesai commented 6 years ago

@martinezjavier, in the steps you have above can you do try this before the last two steps export TPM2TOOLS_DEVICE_FILE=/dev/tpm0 handle=$(tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -e -a| grep EXTENDED |cut -d ':' -f2) tpm2_unseal -c $file_unseal_key_ctx -S $handle

idesai commented 6 years ago

@martinezjavier this was a working example with @williamcroberts implementation of lua. This is an older example for rsaencrypt and rsadecrypt with an older version of createpolicy.

!/usr/bin/lua

require("tpm_shell") s = tpm_open("--tcti", "tabrmd") rc = createprimary(s, '-A', 'o', '-g', '0xb', '-C', 'prim.ctx', '-G', '0x1') rc,t = createpolicy(s, '-P', '-i', '0', '-g', '0x4', '-f', 'policy.file') print(t["policy"]) --should be an input to create rc = create(s, '-c', 'prim.ctx', '-g', '0xb', '-G', '0x1', '-L', 'policy.file', '-o', 'key.pub', '-O', 'key.priv') rc = load(s, '-c', 'prim.ctx', '-u', 'key.pub', '-r', 'key.priv', '-n', 'key.name', '-C', 'sec.ctx') rc = rsaencrypt(s, '-c', 'sec.ctx', '-I', 'plain.txt', '-o', 'plain.enc') rc,t = createpolicy(s, '-P', '-i', '0', '-g', '0x4', '-r') print(t["sessionHandle"]) --should be an input to rsadecrypt rc = rsadecrypt(s, '-c', 'sec.ctx', '-I', 'plain.enc', '-o', 'plain.dec', '-Y') tpm_close(s)

martinezjavier commented 6 years ago

@idesai bypassing the kernel RM and using /dev/tpm0 directly for the policy creation + unseal does work indeed. Does this means that's a problem with the kernel RM then?

I'm busy with other non-TPM related stuff today, but now that I've a test case I'll dig deeper on this next week.

#!/bin/bash -e

alg_primary_obj=sha256
alg_primary_key=ecc
alg_create_obj=sha256
alg_create_key=keyedhash
alg_pcr_policy=sha1

pcr_ids="7"

file_pcr_value=pcr.bin
file_input_data=secret.data
file_policy=policy.data
file_primary_key_ctx=primary.context
file_unseal_key_pub=obj.pub
file_unseal_key_priv=obj.priv
file_unseal_key_ctx=load.context
file_unseal_key_name=load.name
file_unseal_output_data=unseal.output

secret="12345678"

export TPM2TOOLS_TCTI_NAME=device
export TPM2TOOLS_DEVICE_FILE=/dev/tpmrm0

tpm2_takeownership -c
tpm2_createprimary -A e -g $alg_primary_obj -G $alg_primary_key -C $file_primary_key_ctx
tpm2_pcrlist -L ${alg_pcr_policy}:${pcr_ids} -o $file_pcr_value
tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -F $file_pcr_value -f $file_policy
tpm2_create -g $alg_create_obj -G $alg_create_key -u $file_unseal_key_pub -r $file_unseal_key_priv -I- -c $file_primary_key_ctx -L $file_policy -E <<< $secret
tpm2_load -c $file_primary_key_ctx -u $file_unseal_key_pub -r $file_unseal_key_priv -n $file_unseal_key_name -C $file_unseal_key_ctx
export TPM2TOOLS_DEVICE_FILE=/dev/tpm0
handle=$(tpm2_createpolicy -P -L ${alg_pcr_policy}:${pcr_ids} -e -a| grep EXTENDED |cut -d ':' -f2)
tpm2_unseal -c $file_unseal_key_ctx -S $handle

rm -f load.context  obj.priv  obj.pub  pcr.bin  policy.data  primary.context load.name

$ ./test_new.sh

CreatePrimary Succeed ! Handle: 0x80ffffff

Bank/Algorithm: sha(0x0004)
PCR_07:6d7206871c9c6f38ad3997baceebee95dadec04d
ObjectAttribute: 0x00000012

Create Object Succeed !

Load succ.
LoadedHandle: 0x80fffffe

12345678

martinezjavier commented 6 years ago

@idesai I don't think it's fair to compare with the lua script since in that case the sapi context is reused while for the tools a new sapi is initialized and closed after the tool exits (i.e: sapi_teardown_full is called by the lua shell after all commands were executed, while the CLI calls it after each tool).

williamcroberts commented 6 years ago

@martinezjavier sapi_teardown_full is pretty much a Tss2_Sys_Finalize() followed by a free().

TSS2_RC Tss2_Sys_Finalize(
    TSS2_SYS_CONTEXT *sysContext
    )
{
    return TSS2_RC_SUCCESS;
}

IRC, the last time I looked at the kernel spaces addition, on release of the fd (final close and link count == 0) the RM calls flush.... so the same bug is likely present.

webmeister commented 6 years ago

For the kernel RM, you need to keep your file handle to /dev/tpmrm0 open and reuse it for all commands that you want to execute in the same session/with the same objects. Once you close the file handle, every temporary object created with it will be gone from the TPM. As far as I understand it, this is by design and not a bug. /dev/tpmrm0 is meant to isolate applications from one another, so that when opening /dev/tpmrm0 each application sees an empty TPM. How is the kernel RM supposed to know that when you reopen /dev/tpmrm0 you want to reuse some old objects? Where should it store those objects (contexts) while your handle is closed, without running out of resources at some point?

This makes it difficult to execute sequences of commands based on separate executables, as you now discover. It is more suited to a programming model similar to the Lua snippet that has been posted above.

williamcroberts commented 6 years ago

@flihp knows the spec around this in more detail, but the same logical issue and resource exhaustion issues are present in the user land one as well. Whether at an FD boundary between the kernel and userspace, or IPC FDs between the userspace RM and a client, these issues exist.

@flihp care to comment? Is the spec clear on this between multiple client invocations that occur outside of a process?

martinezjavier commented 6 years ago

@williamcroberts yes, I've seen the sapi_teardown_full() implementation :smile:

My point was that I wouldn't had expected the same behaviour from the Lua shell than from the individuals tools due the differences on how the contexts were handled for each case.

@webmeister thanks a lot for the clarification. If that is by design, then I don't see how the individual tools could do it without starting a new session for authentication like the tpm2_unseal does now.

williamcroberts commented 6 years ago

@webmeister @martinezjavier I wouldn't consider the RM calling FlushContext this cut and dry.

Section 15.4 of library spec states:

The TPM assigns session handles when an authorization session is started (TPM2_StartAuthSession()).
An HMAC session is assigned a handle with an MSO of 0216 and a policy session is assigned a handle
with an MSO of 0316. Each authorization session handle is associated with a unique context that may
exist in only one place at a time: either on the TPM in a Shielded Location, or in a saved context as a
Protected Object. The handle remains associated with the session as long as the session exists and does not change when the session is context-saved and reloaded.

TL;DR - Tss2_Sys_ContextSave() makes a session handle a protected object.

Then section 3.7 of tabrm spec states:

A resource manager MUST allow TPM2_ContextSave and TPM2_ContextLoad commands from
connections. Any object or sequence context that has been saved by a command through a connection
MUST be loadable by any connection. Session contexts MUST be loadable by the same connection that
saved them.

TL;DR - objects must be made available through any connection.

The reach is: is a session handle saved an object and thus subject to that statement. Especially considering that session contexts are explicitly spelled out separately.

If anyone wants to chime in with definitive text on this subject, it would be very much appreciated.

flihp commented 6 years ago

The quote I sent you from the tabrmd spec is the important bit. The nomenclature is easy to get caught up in but the TL;DR you call out is the stuff we care about. The RM currently flushes all sessions associated with a connection when the connection is closed. Per the text you quote, this is wrong. I only noticed this while editing the document for the new version that's out for public review :weary:

I'm finishing up some work generalizing the IPC module in the tabrmd today / tomorrow. Should be able to pick this up the session stuff later this week. I'll add you @williamcroberts as a reviewer to those PRs. Most relevant will be the test cases but I'd always appreciate an extra set of eyes on the commits to the daemon as well. I'm expecting there will need to be some refactoring internally to support this but it shouldn't be too bad.

williamcroberts commented 6 years ago

@martinezjavier this means we can drop this feature from tpm2_unseal, but I see in clevis you are using the in-kernel resource manager.

This kind of leaves me in a torn state, as I see the future of RMs being the kernel one, and the userspace slowly going away (IMHO).

I could create a little service that holds an fd open and then use the device TCTI to send it requests to a socket file, ie:

tpm2_sessiond start /foo/bar
tpm2_<command> -T device:/foo/bar
tpm2_sessiond stop /foo/bar

webmeister commented 6 years ago

The reach is: is a session handle saved an object and thus subject to that statement. Especially considering that session contexts are explicitly spelled out separately.

In the TPM world, a session (context) is not an object (context). Those are completely different structures in the code, that are handled differently and you can even find the distinction easily in the public interface: for example, there are two different properties for the size of an object context (TPM_PT_MAX_OBJECT_CONTEXT) and the size of a session context (TPM_PT_MAX_SESSION_CONTEXT).

If sessions were objects, those sentences you quoted from the tabrm spec would not make sense: "Any object [...] context [...] MUST be loadable by any connection." If any connection must already be able to load any object (including sessions), then adding the sentence "Session contexts MUST be loadable by the same connection that saved them." seems weird.

williamcroberts commented 6 years ago

@webmeister sure I agree with your statement, which means at a minimum objects being flushed on fd close for both abrmd and in-kernel RM are bugs.

As far as sessions, the userspace abrmd is going to not flush sessions anymore and implement some type of eviction algorithm for those as well and likely some form of policy controls on them.

As far as all this stuff being a "resource leak", any client that keeps a connection to an RM alive and creates tons of sessions can exhaust resources as well.

For those using an the in-kernel rm, a little session service could broker the requests to hold fd's open, and abrmd will have session support natively.

I think, in many practical regards, the tpm spec is broken/non-functional, we think it's ok to veer off a little bit in this circumstance. Folks need session support, and shouldn't need to have something like the lua-shell hack I threw together.

webmeister commented 6 years ago

which means at a minimum objects being flushed on fd close for both abrmd and in-kernel RM are bugs.

I cannot see how you arrive at that conclusion from the text in the specification: "Any object or sequence context that has been saved by a command through a connection MUST be loadable by any connection." So if your application executes a TPM2_ContextSave, then another application must be able to use TPM2_ContextLoad for the blob that has been returned. But if your application creates an object, and then just closes the fd/connection, it cannot expect to be able to reference that object only by its previous handle.

In this scenario, there is also no resource leak. Every application can call TPM2_ContextSave for as many objects as it likes, but it has to have its own place to store all those context blobs. The resource manager does not have to magically store all objects that are still present in the TPM when the fd/connection is closed.

As far as all this stuff being a "resource leak", any client that keeps a connection to an RM alive and creates tons of sessions can exhaust resources as well.

For the moment, let's assume only well-behaving clients. They create their objects, do some operations with them, and then exit (or sometimes crash due to unrelated bugs). If the resource manager would magically store all remaining objects at the end of a connection for later use, then it would slowly consume all available memory. You'd need to require all applications to properly flush their objects before they exit, but even that would not work when they crash during execution.

Folks need session support, and shouldn't need to have something like the lua-shell hack I threw together.

Well, or maybe this just shows the limitations of the tpm2-tools approach. It's nice for simple scenarios, or maybe for interactive usage, but when you want to implement more complex applications, why not do that from a standard programming language (without these problems)?

Sure, using the C API directly might be more difficult than writing a Bash script based on the tools, but your Lua example looks nice enough, and I'd imagine proper Python (or $your_favorite_language) bindings to be even more comfortable to use.

williamcroberts commented 6 years ago

@webmeister I'm not really disagreeing with any of your statements, actually I agree with them, and I am trying to understand a lot of the nasty details on tpm, so bear with me. Yesterday I poured through a lot of docs and may have crossed a bunch of wires.

As far as dealing with object life cycles in the RM, @flihp is working on the RM. You might want to keep an eye out for those PRs on tpm2-abrmd project.

Any higher level API just turns into the ESAPI essentially. I think having a workable shell solution with sessions is valuable, even if it is just for prototyping.

I think just a hack to keep an fd open for the tools is better hack than having a lua dependency.

webmeister commented 6 years ago

Any higher level API just turns into the ESAPI essentially.

In part, yes. But it is also about ease of use of the API. The C API is just a mapping of the specification, so you have to fill every parameter yourself. I'd expect bindings to higher level languages like Python for example to provide sensible defaults for all parameters as far as possible, similar to what the tools already do in some cases.

I think having a workable shell solution with sessions is valuable, even if it is just for prototyping.

If someone prefers prototyping in shell, then sure (in my experience, whenever I implement something as a shell script, as soon as it grows longer than ten lines of code, I wish I had used a proper programming language from the beginning ;-)). But with this being the first/the only solution at the moment, it risks becoming the standard solution for application development, just because people find it to be the only one that is available/documented and can be easily copied. What can we do to avoid this?

I think just a hack to keep an fd open for the tools is better hack than having a lua dependency.

What do you dislike about the Lua solution? That it is Lua instead of shell? Or that the Lua API is not as nice as it could be (e.g. in the example above all parameters are strings instead of using more natural types)? This last point is one of the great advantages I see in using a language other than shell, that you do not pass around all values as strings, so that e.g. your IDE can already tell you that sha1 is a valid identifier, but SHA1 isn't. The same is true for parsing outputs. What I implemented in c2cc8a799a2fba6ef561269e51f877a7dfdfe617 works, but I do not think that it is a particularly nice solution (but I could not come up with something better in shell).

Another way to implement your hack could be something like this:

execute_together <<"EOF"
tpm2_foo ...
tpm2_bar ...
EOF

where it is the job of the execute_together executable to establish the connection to the TPM and set up the environment variables appropriately before then executing all the commands that were passed to it. I'm not sure whether this is better than your tpm2_sessiond idea, but it would avoid having processes running in the background (where they might never be closed).

martinezjavier commented 6 years ago

@webmeister I did consider using the tpm2-tss library directly at the beginning instead of a shell script that uses the tpm2-tools. But I decided to do the latter because I would had to implement much of the code that was in the tools anyways (and I thought it would be more useful for the community to have feature complete tpm2 tools than an ad-hoc code for my use case).

At the time I was even less familiar with the TCG spec, so I didn't know about the limitations on how the session contexts were handled by the resource managers.

In hindsight, I would probably had spent time on moving most of the tpm2-tools helpers into a library that would provide high level operations instead of having to use the low level sapi library functions directly. I still think we should do that regardless of the session context issue were are discussing here, or what we end doing with the tpm2_unseal tool. We can even have bindings for other languages as you said.

One constraint I have is that my solution has to be executed in the initramfs, and we don't have a lua (or python) interpreter there. So it has to be either a binary or a shell script. That's why clevis (and I guess dracut too) is a combination of C and bash and not a python project.

@williamcroberts I'm not sure to like the tpm2_session idea. It does feel like a hack and I don't see why it's better than the current workaround in the tpm2_unseal tool that creates a session for PCR policy auth. I understand that's limited and more complex policy sessions can't be used, but again I think that unsealing an object or reading an NV area according to a PCR policy would be a very common use case.

If we will need a background running process just to keep a fd open, I wonder if wouldn't be better to just use the tpm2-abrmd instead. The idea of using the device TCTI and the kernel RM was to avoid having a deamon.

martinezjavier commented 6 years ago

I've taken some time to read the specs again and following are my notes (please let me know if I got something wrong):

TPM2_FlushContext() doesn't have the same behavior for contexts associated with a session and other objects.
TPM2_FlushContext() delete all context associated with a session and invalidates the saved contexts. After it, saved context for sessions can't be loaded anymore.
TPM2_FlushContext() removes the transient object from the TPM but doesn't invalidate the saved contexts. After it, saved context for objects can still be loaded to the TPM.
Sessions have a continueSession attribute that must be SET if the session needs to remain active after a command completes. If is CLEAR, then session will be flushed from the TPM.
TPM2_ContextSave() and TPM2_ContextLoad() also have a different behavior for objects and sessions.
For objects, TPM2_ContextSave() creates a copy of the object context but the original context remains in the TPM RAM. TPM2_ContextLoad() will load the object and assign a new handle. So there will be two copies of the same object with different transient handles.
For sessions, their associated contexts are unique. That is, context can either be in the TPM or saved off the TPM. And a saved session context can only be loaded once. Also the handle associated with a session doesn't change as long as the session is active.
On a TPM2_StartAuthSession() command, it could be that the TPM doesn't have a free handle to assign to a session. In this case, the TPM can either return a TPM_RC_SESSION_HANDLES or TPM_RC_SESSION_MEMORY response code. In case of the former, the resource manager is supposed to flush a session and in case of the latter, it's supposed to flush a transient object.
It's worth mentioning that these command responses are listed in "Table 4 - Command-Independent Response Codes". The spec says that it's because these are not an error since the resource manager can remedy the situation.

So a few thoughts based on this:

Not only TPM2_FlushContext() can invalidate a session. The tools should also make sure to set the continueSession attribute for chained commands.
While I agree with @webmeister that having the resource manager to flush all objects and sessions after the connection handle is closed is a nice and simple design, I'm not sure that's absolutely required. At least I didn't find mentioned explicitly in the specification.

I can think of a design on which objects are flushed (since these can always be loaded) but sessions are kept active. If the resource manager gets a TPM_RC_SESSION_HANDLES or a TPM_RC_SESSION_MEMORY response code, then it can do some cleanup and retry the command. For example it may use a LRU algorithm to remove old/stale sessions. That way resource exhaustion won't happen.

AFAICT neither the tpm2-abrmd nor the kernel resource manager handle these response codes currently.

williamcroberts commented 6 years ago

@martinezjavier this higher level API one wants is called the ESAPI: https://trustedcomputinggroup.org/wp-content/uploads/TSS_TSS-2.0-Enhanced-System-API_V0.9_R03_Public-Review-1.pdf

As far as session continuation, we would set it when -S is passed to a tool.

Maybe the approach is then simple session support in each tools, and complex sessions you need to use the daemon, tabrmd or the SAPI directly? I just want to make sure for the next release we have a clear, understood path. I'd like to start quieting the code thrash/churn in the project. So if we add a feature, it needs to have road map.

I'm going to read through everything again.

But this is one my major gripes with standards bodies, every-time they "build" a standard it usually has some serious issues; they miss the forrest for the trees. This is similar to smart cards all over again.

williamcroberts commented 6 years ago

@webmeister I dislike the lua shell for the following reasons:

Dependency on Lua
Dealing with data passing (agreed stringly typed sucks)

I think if I built a dedicated lua tpm shell without re-using the tools, I would like it better. I think I would pick Lua over Python for this, just simply because Lua builds almost everywhere.

I'm not opposed to bringing back the lua shell. In fact, I didn't expect this many people to be OK with it.

Maybe instead of doing a re-use the tools approach, we do a complete from scratch lua tpm shell with high level commands that mirror the tools (maybe share a lib between them).

martinezjavier commented 6 years ago

@williamcroberts thanks for the pointer to the ESAPI spec. I've read about it before but didn't know about this doc.

My opinion is that tools should support at least simple sessions. For example tpm2_unseal and tpm2_nv{read,write} being able to authenticate using a PCR policy. Even if that means the tools needing to start their own sessions, create the policy, etc like's done in the tpm2_unseal tool.

If more complex sessions are needed, then as you said, an execution environment should be needed to make sure that the connection to the RM is kept opened and the session not flushed.

williamcroberts commented 6 years ago

@martinezjavier The more I delve into this, the more I am ok with simple policy support in each tool. We just need to make sure it's expanded into all the appropriate tools. New bug #532. Good constructive discussion here, thank you all who provided feedback.

webmeister commented 6 years ago

@webmeister I did consider using the tpm2-tss library directly at the beginning instead of a shell script that uses the tpm2-tools.

Yeah, your current code is probably just at the limit in terms of complexity. Without the PCR handling I'd say it is perfectly fine, since then only a few simple calls to the tools are left.

While I agree with @webmeister that having the resource manager to flush all objects and sessions after the connection handle is closed is a nice and simple design, I'm not sure that's absolutely required.

Maybe only required insofar as the RM needs the memory to store all those context blobs (and has no way to know when to finally delete them).

I can think of a design on which objects are flushed (since these can always be loaded) but sessions are kept active.

That might be a solution, especially since the number of active sessions is limited by the TPM anyway (TPM_PT_ACTIVE_SESSIONS_MAX, which is 64 (the minimum according to PTP) for one of the TPMs I've got here).

For example it may use a LRU algorithm to remove old/stale sessions. That way resource exhaustion won't happen.

The only problem I see with that is that it might make your programs fail non-deterministically: Usually, everything works, but if that one program runs and creates its 64 sessions, then your script will fail, because the session that it had created five minutes ago needed to be expired.

I think I would pick Lua over Python for this, just simply because Lua builds almost everywhere.

Out of curiosity: where can you use Lua, but not Python? There might be environments (as mentioned by @martinezjavier) where it is not easy to use either of them, but what makes one of them easier to build than the other? (And are people likely to write TPM applications in such a limited environment?)

Should you use Python, you can expect more help from me (time permitting). I've used Lua a bit, but it was never a comfortable language to work with. Python on the other hand allows me to solve all kinds of problems easily, except maybe in resource-constrained environments, where speed/memory/etc. is of utmost importance (but Python is getting better there as well :)).

If nobody else starts work on a TPM interface for Python, I'll probably at least try to find out how easy it is to wrap the (E)SAPI with CFFI and see what can be built with that.

The more I delve into this, the more I am ok with simple policy support in each tool.

Sounds good to me, and should not be too hard with a shared code base.

williamcroberts commented 6 years ago

If you have a C compiler and a stdlib, you have Lua. Everything Lua wise as far as hooking to C is simple. As far as the Lua language is concerned, it's lacking some features that Python has.

webmeister commented 6 years ago

I see. Now that you say it, I remember reading something about Lua being just a bunch of C files, which makes it especially easy to embed into other applications.

Still, from a user's point of view, I'd prefer Python to implement even just slightly more complex applications (and if it is using a TPM, it is probably rather complex). Do you really expect more people choosing a Lua environment over a Python environment, if both were available? Or do you expect the Lua environment to be easier to realize?

tpm2-software / tpm2-tools

tpm2_unseal #510

!/usr/bin/lua