Definitions of "authenticity" and "integrity"

talltree commented 2 years ago

The term authenticity currently plays a major role in the spec. On today's Technology Architecture TF call, Neil explained that, based on past discussions, we defined authenticity to include integrity. However in reviewing the term authenticity as it appears in section 6—especially in the table explaining the functions of the layers— Dan Bachenheimer points out that many readers with security backgrounds will expect to see integrity listed alongside authenticity because they are considered separate security properties. For example, a message could have been sent by an authentic sender, but tampered with in transit so its integrity is lost.

I researched this quickly and here are three examples from the Wikipedia page on information security:

In 1998, Donn Parker proposed an alternative model for the classic CIA triad that he called the six atomic elements of information. The elements are confidentiality, possession, integrity, authenticity, availability, and utility.

It is important to note that while technology such as cryptographic systems can assist in non-repudiation efforts, the concept is at its core a legal concept transcending the realm of technology. It is not, for instance, sufficient to show that the message matches a digital signature signed with the sender's private key, and thus only the sender could have sent the message, and nobody else could have altered it in transit (data integrity). The alleged sender could in return demonstrate that the digital signature algorithm is vulnerable or flawed, or allege or prove that his signing key has been compromised. The fault for these violations may or may not lie with the sender, and such assertions may or may not relieve the sender of liability, but the assertion would invalidate the claim that the signature necessarily proves authenticity and integrity. As such, the sender may repudiate the message (because authenticity and integrity are pre-requisites for non-repudiation).

Integrity In IT security, data integrity means maintaining and assuring the accuracy and completeness of data over its entire lifecycle. This means that data cannot be modified in an unauthorized or undetected manner. This is not the same thing as referential integrity in databases, although it can be viewed as a special case of consistency as understood in the classic ACID model of transaction processing. Information security systems typically incorporate controls to ensure their own integrity, in particular protecting the kernel or core functions against both deliberate and accidental threats. Multi-purpose and multi-user computer systems aim to compartmentalize the data and processing such that no user or process can adversely impact another: the controls may not succeed however, as we see in incidents such as malware infections, hacks, data theft, fraud, and privacy breaches

Based on these examples, I believe Dan is right: it would be a mistake for us to try to redefine the term authenticity as including integrity. The security community sees them as two different terms—closely related, but still different.

So I recommend:

We revised our glossary definitions to align with the widely understood meanings of the terms authenticity and integrity.
If we decide that for some reason we need to be able to refer to a specific combination of authenticity and integrity, we should define our own term for it.

jospencer-460 commented 2 years ago

Works for me! Integrity and authenticity [= non-repudiation] are different and the completeness angle on integrity ensures that some of the content can't be lost. In a scenario where selective disclosure allows the delivery of authentic data attributes, signed individually, the additional challenge is that all of the content sent, was received (and nothing (authentic) inserted too, I suppose) and the delivery can be said to have integrity.

wenjing commented 2 years ago

@talltree Please identity the section of the document this issue is referring to so we know which usage is causing the issue.

I examined all occasions of the usage of the terms 'authenticity' and 'integrity'. I think this issue is largely a misreading of my original text. The definition of both terms are quite clear and consistent throughout the document, with one unfortunate exception which I will address in the end. The meanings as used in this document are:

(1) authenticity : is referring to the party or endpoint (2) integrity: is referring to messages one party may exchange with another

Authenticity does not imply integrity - in fact, they are not about the same subject. One is about a party. The other is about a message (or data contained in it).

In the case of (1), this is used consistently. (maybe one exception where @talltree introduced additional text that I already proposed to delete or not accept.) In the case of (2), the word is also used in one or two cases for the general meaning but these were quite clear in the context they were used. And for double assurance, we could easily remedy this by using "Message integrity" to tell them apart.

In addition, I object the introduction of other notions of these terms based on how others are using them. It is these other definitions that we must guard against - because we are defining a new architecture. The old uses are inherently based on old architecture assumptions.

Now, the one exception. It appears in 8.1, the first enumerated list, second item. Here, the word authenticity, under the caption of "Common Message Format" was used to mean, message integrity. It's a bad slip of tongue that may have contributed to the confusion. Although the intent should still be clear under that caption. But anyway, I marked that usage to change to "Message integrity", and that hopefully should resolve this discrepancy.

So in short, I propose two simple changes to close this issue: (1) change 'integrity' to 'message integrity' in a few usages that refer to messages (2) fix one usage of 'authenticity' in 8.1 to 'message integrity'.

More importantly, we shall not change our definitions of these two terms to somehow conform to what others use. It is essential to the new architecture we are defining. By the way, I think the thread is agreeing on the conclusion - but the discrepancy explained above may have caused the misreading - and that may be the real culprit.

NOTE: this issue reported before the text in question is committed to the repo. So - readers be aware if you can't follow. I will probably speed up the commits soon to make a clean switch to Github repo.

talltree commented 2 years ago

@talltree Please identity the section of the document this issue is referring to so we know which usage is causing the issue.

@wenjing I mentioned in my original post that it was the uses of authenticity in section 6 — in particular the use I highlight with a comment in section 6.2.

I think your explanation will largely clear this issue up. However to be sure, I propose that we review and agree on our glossary definitions of authenticity and integrity. I don't have time to do that right now, but I'll try to copy those over to this thread from the glossary later tonight.

jiraky commented 2 years ago

Regarding the bullet point:

Common Message Format: The message format must support these required properties

Message Integrity ~Authenticity~

The authentication implies the integrity of the message. Viceversa, the integrity of the message is a requirement for authenticating it, but it is not enough to achieve it. A simple hash attached to a message might provide enough information about the integrity of the message, but it still needs a signature to be able to authenticate the origin.

Some resources on the subject: https://crypto.stackexchange.com/a/205 https://crypto.stackexchange.com/a/5466

dhh1128 commented 2 years ago

Regarding @wenjing 's distinction between authenticity and integrity: I think this is workable. However, I think it is a bit dangerous to define the terms differently than industry practice without explicitly noting the divergence. And I note that in the KERI community, which this architecture is attempting to be friendly toward, there is quite a lot of talk about "authentic messages" and "authentic communications." The meaning they intend is "messages that are known to come from an authenticated sender." This is in line with general language usage, where we talk about a painting being an "authentic Rembrandt" or a piece of news being authentic instead of fake. So I'm a bit concerned about whether our narrow use of the term authentic is really wise.

Given the decision to distinguish integrity from authenticity (which I agree with), I proposed an additional list item in 6.1, saying that integrity is an additional core goal.

dhh1128 commented 2 years ago

Another way to define authentic, which would preserve the distinction vis-a-vis integrity, but which would not rest on Wenjing's distinction of being focused on the party, is to say that authentic is "having the expected origin or the expected association to an originating identity." A remote party is authentic if they are who we think. A message is authentic if it comes from the party we expect.

jospencer-460 commented 2 years ago

A (IoT) device may be the endpoint system. We can identify the device [one level of authentic messaging], but then there may be an additional business authentication that the IoT device is owned by a person or organisation and it has authority to act in this messaging scenario. So in this case, the endpoint is the authenticated device (level 2?) and the authority is also authenticated (level 4).

neiljthomson commented 1 year ago

I would suggest stating industry terms for Integrity, Authenticity, Non-Repudiation, Privacy and Confidentiality and then covering the "implementation" as applies to the ToIP stack which is primarily driven by cryptographically signed objects (data, identifiers, VCs, any "verifiable statement". This may go in the Design Principles and/or the ToIP/Tech Arch Terms wiki(s).

That approach provides a bridge for non ToIP/SSI readers.

andorsk commented 1 year ago

@talltree mind if I assign this to you?

neiljthomson commented 1 year ago

Those are already in the TATA Terms Wiki. Time to refresh people that it’s open for additional terms and concepts to be defeind and added.

Neil Thomson QueryVision <blocked::http://www.queryvision.com/> www.QueryVision.Com T-613.220.9929 @. @.>

From: Andor Kesselman @.> Sent: October 17, 2022 6:05 AM To: trustoverip/TechArch @.> Cc: Neil Thomson @.>; Comment @.> Subject: Re: [trustoverip/TechArch] Definitions of "authenticity" and "integrity" (Issue #10)

@talltree https://github.com/talltree mind if I assign this to you?

— Reply to this email directly, view it on GitHub https://github.com/trustoverip/TechArch/issues/10#issuecomment-1280608922 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAMOV6WBKYD74Z3P22PGALWDUQC7ANCNFSM54JWHHZA . You are receiving this because you commented. https://github.com/notifications/beacon/ADAMOV3WZWZ76C6GPBDTPALWDUQC7A5CNFSM54JWHHZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOJRKIVGQ.gif Message ID: @. @.> >

jospencer-460 commented 1 year ago

As discussed at the TATF (20-10-2022), I'm happy with the current TA content on this topic. @neiljthomson will feed in after a review.

neiljthomson commented 1 year ago

There are lots of interrelated terms and concepts being dragged into this discussion. Not the least of which is sorting out how these terms apply to data (Authentic Data) and the entire environment (Authentic Web).

Given how many of these terms are defined in other environments (incl. non SSI IT), this suggests that SSI definitions need to differentiate from (and possibly directly point to) how the terms from other contexts are different or modified in SSI.

So I'll take ownership (don't appear to have GitHub rights to assign - don't see the option in my UI). Action to @talltree

talltree commented 1 year ago

I had a long talk with @SmithSamuelM about this which was written up in the notes of the 2022-10-27 TATF meeting. See those notes for more details, but the net of it Sam summarized this way:

There is no concept of data transmission over the Internet where you can establish the authenticity of the data — secure attribution to a source — without having confirmed the integrity of the data.

So the resolution is that when it comes to the ToIP stack and the Layer 2 Trust Spanning Protocol, our understanding of “authenticity” is based on the following:

A communication is authentic at ToIP Layer 2 when the receiver can cryptographically verify that it has been digitally signed by the private key bound to the sender’s identifier. Because this form of authenticity is conveyed via a digital signature over a body of content, by definition that digital signature is only valid if the body of content has not been tampered with in transmission. Therefore this form of authenticity inherently includes integrity.

If we agree on this point, then the next steps are:

I will prepare the PR needed to update the text in the spec. DONE — see #54.
Neil will reflect this in our work to prepare the glossary.

talltree commented 1 year ago

Note: Issue #16 is a duplicate and is being closed to consolidate the issue here.

jospencer-460 commented 1 year ago

I agree with the intent of this statement. The only comment I would make is that the "defined content" has to be signed as a whole by the sender's private key.

It could be that there are multiple message parts that are individually signed for different purposes. I've had to do this where a payment instruction message, from the payer, has multiple signatures over different message content. In the design, the protocol splits off the instruction message into different interactions with other parties, orchestrated by the receiver (in this case the settlement system). The receiver then reconstitutes another message and sends this on to the payee. The end-to-end non-repudiation of the critical content in the initial instruction can then be verified by different parties who use (and receive) different components of the instruction.

A long explanation, but important.

dhh1128 commented 1 year ago

A communication is authentic at ToIP Layer 2 when the receiver can cryptographically verify that it has been digitally signed by the private key bound to the sender’s identifier

I agree with the general sentiment and with the proposed resolution. However, I disagree with the summary that cryptographic verifiability must entail digital signatures. This is how KERI does it, and it's a good way to do it. But it is also possible to get cryptographic verifiability by encrypting. If the encryption key is known to sender and receiver, but nobody else, then you have integrity AND the receiver knows that the sender sent it, but you don't have a signature. This is how DIDComm 1 and DIDComm 2 work. In DIDComm 3, I hope we will switch to signing everything the way KERI does -- but I don't want us to overstate the case for signing.

SmithSamuelM commented 1 year ago

A couple of nuances to @dhh1128 comments. I agree that signing is not the only way to provide a form of verifiable authenticity.

This topic , however, was about whether or not “integrity” is an independent principle with respect to authenticity. Shared secret encryption also has the property that if the message has been tampered with the “authenticity” verification will fail. In this case the message will not decrypt correctly. So @dhh1128 reinforces the point that message integrity is not independent of message authenticity.

But directly in response to whether or not authenticity entails digital signatures. The ToIP definition of authenticity is “attribution to the source of communication” and source attribution is at least a weak form of non-repudiation. A shared secret is repudiable by either party. Therefore the two parties must mutually trust each other because either party could encrypt something and assert that it was a statement made by the party. This means the authenticity mechanism only provides authenticity limited to communication between the two parties. Since either party knows what they themselves encrypted they know that all other encrypted communication must have been sourced by the other party assuming of course that each party keeps a record of everything they encrypted so that can check against that corpus. When the group expands beyond 2, then even that fails and the authenticity is further weakened to sourced by someone else in the group but not which one. And no group member can prove it wasn’t sourced by them to anyone else in the group or any third party.

This weak form of authenticity is probably not what most people have in mind with regards to many use cases like VCs but its not ruled out either for other ToIP use cases.

To elaborate, shared secret encryption provides a type source attribution that is best used for confidential communication over a channel for data in motion. It is useful for ephemeral authenticity in motion between two parties and not persistent authenticity at rest because it cannot be conveyed to a third party without sharing the secret. A digital signature provides authenticity at rest because it may be stored and forwarded to a third party (not a party to the shared secret) who may then verify it independent of the channel by which it was communicated originally.

Shared secret encryption is less zero-trust than a non-repudiable digital signature because it expands the non-verifiable trust surface versus the verifiable zero-trust surface.

This weaker form of authenticity, however, may be perfectly acceptable to those two parties who only use this form of authenticity for select communications never meant to be verified by a third party. Therefore AFAIK it does not conflict with the definition espoused by the ToIP “authenticity” principle. But a note clarifying that other forms of verifiable source attribution (aka verifiable authenticity) besides non-repudiable verifiable source attribution such as digital signatures may be acceptable in some use cases.

So with appropriate caveats shared secret encryption may be used as a weaker form of verifiable authenticity amongst the group that holds the shared secret.

Before digital signatures using asymmetric cryptography were invented, signing meant using a message authentication code (MAC) with a shared secret. Diffie Hellman key exchange uses PKI to make the shared secret exchange more secure but does not change the ephemeral nature of the signing. It should only be used for ephemeral authenticity.

dhh1128 commented 1 year ago

Shared secret encryption is less zero-trust than a non-repudiable digital signature because it expands the non-verifiable trust surface versus the verifiable zero-trust surface.

Agreed. And this is why I think KERI's approach is better. I'm not advocating the weaker form; I'm just saying that if we want to be crisp, the association between authenticity and digital signatures is "here's the best way to do it" rather than "this is what it means."

SmithSamuelM commented 1 year ago

@dhh1128 Agreed

SmithSamuelM commented 1 year ago

How about this revision>

A communication is authentic at ToIP Layer 2 when the receiver can cryptographically verify that it has been sourced by the sender’s identifier.

Another way of saying this is that the receiver can make cryptographically secure attribution of the communication to the sender's cryptonymous identifier.

Can't decide if need to qualify identifier as cryptonymous in the italicized statement.

The most practically accessible zero-trust mechanism for cryptographically secure attribution of a message to a sender's cryptonymous identifier is/are non-repudiable digital signature(s) based on asymmetric key pair(s). The private key(s) is/are used by the sender to generate the signature(s). The public keys is/are used by the receiver to verify the signature(s). The sender's cryptonymous identifier is cryptographically verifiably bound to the public key(s).

neiljthomson commented 1 year ago

Will be doing some heads down on distilling the material in this thread, plus reviewing the material in other Wikis and sources on Authentic, Integrity, etc. for the Trust Arch terms wiki.

The direction (which echos work in the KERI terms wiki) is to define a term (e.g., Authentic) and then have "extensions"/"context specific" terms defined as well (e.g., Message Integrity, Authentic Messages/ing) where there are additional relevant factors or considerations.

wenjing commented 1 year ago

@SmithSamuelM @dhh1128 I agree with and like the simplicity of this revision. I would prefer to use "verifiable identifier" if that still works in the place of "cryptonymous" though - because the current version of text already introduced that phrase and it seems simpler.

A communication is authentic at ToIP Layer 2 when the receiver can cryptographically verify that it has been sourced by the sender’s identifier.

Another way of saying this is that the receiver can make cryptographically secure attribution of the communication to the sender's cryptonymous identifier.

Can't decide if need to qualify identifier as cryptonymous in the italicized statement.

I'm thinking about another possible revision: A communication is authentic at ToIP Layer 2 when the receiver can cryptographically verify the sender's identifier and that its ToIP Layer 2 message has been sourced by that verifiable identifier.

Let me explain. The word "message" or "communication" is not sufficiently precise unfortunately. The data contained within a message is layered, the authenticity verification may only need to apply to the core portion of it to be considered authentic. First, data in a lower layer (below layer 2, e.g. HTTP/TCP headers) does not need to be. Second, I would say data in higher layer (or the payload of the layer 2 message) does not necessarily need to be authentic in exactly the same way. Some of that content may need the same strong form of authenticity (as the preferred form with asym key pairs), but one may come up some trust tasks or applications that can use other forms or weaker forms. In strictest sense, the identifier (meta data) may be the only data that matters (e.g. a ping message with no content). I think that the ToIP Layer 2 message should be able to support/accommodate all these variations in order to be a common spanning layer.

So that's why I added a qualified "ToIP Layer 2 message" - to replace the "it" - to indicate that this authenticity may only apply to the data introduced in Layer 2. Covering other higher layer payload is optional - a decision the upper layer can make.

It is not as easy a read but, well, what do you think?

dhh1128 commented 1 year ago

I like Sam's version and Wenjing's version. Both are accurate and pithy. The subtlety that Wenjing brings up about a message being multilayered is true. However, once you open that layer of detail, then you are sort of inviting formal cybersecurity proofs, and even the nuance you're offering isn't going to be enough. (I speak from experience here; we have people doing formal proofs of the cybersecurity properties of DIDComm, and asking questions based on exactly the sort of verbiage Wenjing is proposing. They need more.) So adding the extra nuance is a tradeoff; I don't have a strong opinion about the sweet spot and will be happy no matter how it's sorted.

SmithSamuelM commented 1 year ago

@wenjing verifiable identifier is a good substitute for cryptonymous identifier.

SmithSamuelM commented 1 year ago

A communication is authentic at ToIP Layer 2 when the receiver can cryptographically verify that it has been sourced by the sender’s verifiable identifier.

Another way of saying this is that the receiver can make cryptographically secure attribution of the communication to the sender's verifiable identifier.

wenjing commented 1 year ago

I like @SmithSamuelM latest version. I agree with @dhh1128 observations and don't have interest to open complex issues unnecessarily, so the succinct definition is a good choice. Thanks!

neiljthomson commented 1 year ago

"the receiver can cryptographically verify that it has been sourced by the sender’s verifiable identifier"

Stupid Question: How is the receiver informed of the sender's identity/identifier/address?

What is left unsaid (at least in this issue discussion) are assumptions about the messaging protocol, "packaging & routing" information for the message (contents), and interaction on establishing the communication channel before sending the message.

Based on @dhh1128 comments on DIDComm1/2, this implies that in setting up the DIDComm channel that the sender and receiver are exchanging their identifiers or otherwise know who the other party is prior to sending messages.

A more general messaging mechanism would have the message packaging/protocol include the sender's identity/identifier so the receiver knows the sender, and the receiver's identity/identifier/"address" is included so the routing system knows how to direct the message.

Or the Senders information is an inherent part of the message itself.

What is the design intent for general ToIP messaging?

SmithSamuelM commented 1 year ago

@neiljthomson The message itself can have the sender's identifier included.

As in a from: Identifier

SmithSamuelM commented 1 year ago

@neiljthomson One way to better understand the authenticity is to think of authenticity as two types.

Type 1 Authenticity is verification against a cryptonymous identifier as sender. Cryptonymous is short for cryptographical pseudonymous. A cryptonym may not be linked to the natural person or legal entity that controls it. But if the message is no-repudiably signed by the private key that controls the identifier we know that the controller of that identifier signed it whomsoever that may be. Type 1 Authenticity is necessary but may not be sufficient.

Type 2: Authenticity is linking a given cryptonym to a natural person or legal entity. If we want to preserve privacy we want to control that scope of that linkage. So we don't always want type 2 authenticity. The other two principles of confidentiality and privacy may apply to how much linkage we want.

So when we say verifiable identifier we mean cryptonymous identifier and therefore the definiton of authenticity is really type one. But does not preclude type 2

SmithSamuelM commented 1 year ago

@neiljthomson The beauty of type 1 authenticity is that it provides what I call "latent accountability". That means that at some time in the future all statements signed by the controlling private key(s) of an identifier may be linked to a natural person or legal entity once that person chooses to disclose that linkage. That linkage may be governed by a confidentiality agreement and may also be governed by privacy law third principle. Contractually protected discloser essentially uses confidentiality law to protect the privacy of the Discloser when making linkage. Phil Windley wrote a nice blog and used the term "provisional authenticity" to refer to type 2 authenticity derived from the latent accountability provided by type 1 authenticity. He also uses the term "functional privacy" to refer to the contractually protected disclosure.

https://www.windley.com/archives/2022/03/provisional_authenticity_and_functional_privacy.shtml

SmithSamuelM commented 1 year ago

By separating authenticity into two types: Type 1 which is verifiable proof that sender is the controller of a cryptonymous identifier and Type 2 which is linkage of the controller of a cryptonymous identifier to a natural person we are able to operate in the trade space whose dimensions or axis are Authenticity, Confidentiality, and Privacy. When we do not make that separation then we force ourselves to only operate at the edge of that trade space and that is why IMHO the SSI community keeps getting stuck when trying to solve the problems associated with privacy, authenticity, and confidentiality, the solutions are only at the edges not across the whole trade space.

SmithSamuelM commented 1 year ago

The PAC principle or PAC trilemma states that we can have all of Privacy, Authenticity, and Confidentiality but not all at the highest level. We must pick two and the third is at a lower level. But when we force linkage to the natural person as the only degree of authenticity then we only get two of the three and the thrid is at zero. We make the trilemma hard not soft.

neiljthomson commented 1 year ago

@neiljthomson The message itself can have the sender's identifier included.

As in a from: Identifier

@wenjing - comment on message format/packaging for routing re: inclusion of sender, receiver identifiers?

jospencer-460 commented 1 year ago

Gents - this whole thread is why we have encryption and signing key-pairs, for specific purposes.

neiljthomson commented 1 year ago

Actually, this thread is the only viable product this project has produced so far ;-)

andorsk commented 1 year ago

Closing with #54 finished merged and based on conversations on 11/10 meeting at TAFT.

SmithSamuelM commented 1 year ago

A PKI identity based (more correctly identifier based) security overlay provides a mechanism for secure attribution of a message to its source as indicated by the source identifier. Trivially the source identifier is the public key of a public private key pair. The public key is included in the message and the whole message including public key is signed by the private key. The problem with such a trivial overlay is that the identifier (in this case the public key) is not persistable, its ephemeral. Everytime you need to refresh the key pair you lose the identifier (the old public key), and therefore any reputation, any out of band authentication, any linkage, any transaction history is lost because it is no longer cryptographically bound to the identifier.

KERI solves this persistent vs. ephemeral identifier problem for security overlays by creating identifiers that are not the public keys but are cryptographically bound to the initial key state which includes the current set of public keys. The Key Event Log in KERI is a verifiable data structure that provides proof of the current key state for the associated identifier. This allows the identifier to persist while allowing the key state to evolve. Thus any reputation, transaction history, etc remains attached to the persistent identifier in spite of refreshes to the controlling keys.

IMHO it elegantly solves the hard problem of PKI, that is, key rotation or evolution of key state for a persistent identifier in an identifier security overlay that solves the secure attribution problem.

Having key pairs for specific purposes does not address the underlying problem of secure attribution of the key pairs to a given controller. Each time you create a key pair you have to solve the attribution problem all over again unless the key pairs are derived or delegated from a persistent identifier that can be securely attributed to the controller.

trustoverip / TechArch

Definitions of "authenticity" and "integrity" #10