secure distribution of public keys to verify nanopublications

jhpoelen commented 3 months ago

In NanoSession #14, @jhpoelen asked

How are you imagine public/private key solution to work in 20-50 years from now?

@tkuhn answered the questions by saying that the public key is embedded in the nano publication themselves.

jhpoelen commented 3 months ago

Example a nano publication with embedded signature and public key -

as discovered via

https://www.globalbioticinteractions.org/?accordingTo=globi%3Aglobalbioticinteractions%2Fknowledgepixels&interactionType=interactsWith&sourceTaxon=Bison%20bison&targetTaxon=Canis%20lupus

and nanopub url

http://purl.org/np/RAOgLBuvJRusIKPJyhXbx7sMI1aKj_AI0l1oG6XXsO4pU

redirected to

https://nanodash.knowledgepixels.com/explore?id=RAzquSkwsTAZm61nReG6MOjXEXUx8fNVfdWnAzyn6sOhU

in which, I clicked on http://np.knowledgepixels.com/RAzquSkwsTAZm61nReG6MOjXEXUx8fNVfdWnAzyn6sOhU.trig.txt

to generate content with the digital signature hash://sha256/1ed6493e4dcd4172d723dfdf7ab2c06946d6ca43147677c4f5be440a255ccd57 on 2024-06-25 as demonstrated in provenance head/anchor hash://sha256/157c14161e02948db88af1c3430d5a444cdccd67aba4f4d867c18a35e8d8b723 -

@prefix this: <http://purl.org/np/RAOgLBuvJRusIKPJyhXbx7sMI1aKj_AI0l1oG6XXsO4pU> .
@prefix sub: <http://purl.org/np/RAOgLBuvJRusIKPJyhXbx7sMI1aKj_AI0l1oG6XXsO4pU#> .
@prefix np: <http://www.nanopub.org/nschema#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix nt: <https://w3id.org/np/o/ntemplate/> .
@prefix npx: <http://purl.org/nanopub/x/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix orcid: <https://orcid.org/> .
@prefix prov: <http://www.w3.org/ns/prov#> .

sub:Head {
  this: np:hasAssertion sub:assertion;
    np:hasProvenance sub:provenance;
    np:hasPublicationInfo sub:pubinfo;
    a np:Nanopublication .
}

sub:assertion {
  sub:association a <https://w3id.org/biolink/vocab/OrganismTaxonToOrganismTaxonAssociation>;
    rdfs:comment "Wolf (Canis lupus) predation and scavenging of reintroduced bison (Bison bison): a hallmark of ecological restoration to boreal food webs";
    <https://w3id.org/biolink/vocab/object> sub:objtaxon;
    <https://w3id.org/biolink/vocab/predicate> <http://purl.obolibrary.org/obo/RO_0002439>;
    <https://w3id.org/biolink/vocab/subject> sub:subjtaxon;
    <https://w3id.org/kpxl/biodiv/terms/hasSubjectLifeCycleStage> <http://purl.obolibrary.org/obo/UBERON_0018241> .

  sub:objtaxon <https://w3id.org/kpxl/biodiv/terms/hasTaxonName> <https://www.checklistbank.org/dataset/2169/taxon/9901> .

  sub:subjtaxon <https://w3id.org/kpxl/biodiv/terms/hasTaxonName> <https://www.checklistbank.org/dataset/9880/taxon/QLXL> .
}

sub:provenance {
  sub:assertion prov:wasDerivedFrom <https://doi.org/10.1007/s10344-023-01676-0> .
}

sub:pubinfo {
  sub:sig npx:hasAlgorithm "RSA";
    npx:hasPublicKey "MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCjDGQCS1S+SRnERDuYDXOugdYUP0efEquHJEEHAbU/uLzBVlga89zqrNPCS7fBE6lArBUWEmT8eLKdMapyqvAzI1J3jUWTMhDJF+XFBkUiuiFfNSc4vJJcmi0yujtnuzXsRIG202jyaP4f5ULoskFwaZOSBZJfiE0dsB3D7DTIAQIDAQAB";
    npx:hasSignature "BmxFhIUMaljf5MGd6ETv/dTclruWtt0knKiiU9wJPKaq1G3ZfTloNVZB6AckOoKivlQDVr8D8qas4enqbRGhWYLV2se7YMsiMg1lTB09WndovUXp9+5lRIy1s0z1nKF4VqBYEfMNuujtYPyQ8X+cDgqpSjZecrX3iqlBcYlGTD0=";
    npx:hasSignatureTarget this: .

  this: dct:created "2023-09-15T08:54:31.806Z"^^xsd:dateTime;
    dct:creator orcid:0000-0002-1267-0234;
    dct:license <https://creativecommons.org/publicdomain/zero/1.0/>;
    npx:hasNanopubType <https://w3id.org/kpxl/biodiv/terms/BiodivNanopub>;
    npx:introduces sub:association;
    rdfs:label "Canis lupus Linnaeus, 1758 (species) - preys on - Bison bison (species)";
    nt:wasCreatedFromProvenanceTemplate <http://purl.org/np/RAcTpoh5Ra0ssqmcpOgWdaZ_YiPE6demO6cpw-2RvSNs8>;
    nt:wasCreatedFromPubinfoTemplate <http://purl.org/np/RAA2MfqdBCzmz9yVWjKLXNbyfBNcwsMmOqcNUxkk1maIM>,
      <http://purl.org/np/RAh1gm83JiG5M6kDxXhaYT1l49nCzyrckMvTzcPn-iv90>;
    nt:wasCreatedFromTemplate <http://purl.org/np/RAh16oLqLJKo8I8R2CebR1n8Dwv95KL_H-azFfGt2FGW0> .

  <https://www.checklistbank.org/dataset/2169/taxon/9901> nt:hasLabelFromApi "Bison bison (species)" .

  <https://www.checklistbank.org/dataset/9880/taxon/QLXL> nt:hasLabelFromApi "Canis lupus Linnaeus, 1758 (species)" .
}

jhpoelen commented 3 months ago

@tkuhn So, my question is - how do you embed a cryptographic signature that is embedded in the content it signed?

In other words, what part of the nanopub is signed by the private key and is cryptographically sealed?

Apologies for my possibly naive question, its probably some basic thing I can't wrap my head around.

tkuhn commented 3 months ago

Good questions! :)

The signature covers all triples except the one with the signature. And all URIs that eventually contain the hash of the whole nanopublication have a blank space as a placeholder instead. On that content, the signature is calculated, then added as an additional triple, then the resulting triple set is hashed, and this hash is entered into the blank spaces, which then leads to the final signed nanopublication. So, the hash in the identifier covers everything, including the signature.

Does that answer your question?

jhpoelen commented 2 months ago

@tkuhn thanks for elaborating.

As far as I can follow, the "signed" nanopub URI (e.g., http://purl.org/np/RAOgLBuvJRusIKPJyhXbx7sMI1aKj_AI0l1oG6XXsO4pU) ties a semantic hash calculated from the nanopub including its embedded public key and signature of the provenance/assertion/pubinfo segments.

What guarantees that the nanopub URI itself is to be trusted?

PS Thanks for being patient with me as I am trying to solve this trust puzzle. . .

tkuhn commented 2 months ago

OK, right. Technology can help but is not sufficient, of course, to establish trust.

With nanopublications, we normally have a few nanopub URIs and/or a few pubkeys that we trust, e.g. because a trusted pieces of software has them hard-coded, or because it's a link in a paper from people we trust, or we just got it via WhatsApp from a friend etc.

So, if we have a nanopub URI we trust, we can use the decentralized server network to retrieve its content and check the hash. And if it matches we know that it's exactly what we are looking for, and therefore we can trust its content. (The content might still be invalid or wrong, as with everything of course, but we have good reasons to assume it's not spam or noise.)

If the nanopub is signed with a pubkey we trust (and supposedly know who the owner is) then we can also trust that exactly that person published it.

The latter case also applies for any nanopublications we can find in the network, even those whose IDs we didn't trust beforehand.

And now because we have this "anchor" of trust, we can possibly find further information that is relevant for trusting other nanopublications, such as trusted people approving other people's nanopublications. These nanopublications can be introduction nanopubs where the creator declares their pubkeys. So, now we have found another person with pubkey who we can (probably) trust. This user in turn can approve further users, which we can then trust (at an even slightly lower level of trust probably) as well. And so on.

I am not sure that directly answers your question? But in any case, that's roughly the approach on trust we are taking with the nanopublication network.

jhpoelen commented 2 months ago

OK, right. Technology can help but is not sufficient, of course, to establish trust.

Thanks for taking the time to reply!

With nanopublications, we normally have a few nanopub URIs and/or a few pubkeys that we trust, e.g. because a trusted pieces of software has them hard-coded, or because it's a link in a paper from people we trust, or we just got it via WhatsApp from a friend etc.

I agree that trust has to start somewhere, and may not be solved be technology alone. And my initial questions revolved around the tracing of this trust chain, especially in 20-50 years from now.

So, if we have a nanopub URI we trust, we can use the decentralized server network to retrieve its content and check the hash. And if it matches we know that it's exactly what we are looking for, and therefore we can trust its content. (The content might still be invalid or wrong, as with everything of course, but we have good reasons to assume it's not spam or noise.)

Yes, and I wonder who'd keep this list of nanopub URIs "we" trust. And who "who" and "we" ? who - US Library of Congress? Or, who - some Pensoft journal ? And "we" - the general public or "we" - some select group of researchers that have a trusted relationship or common background?

If the nanopub is signed with a pubkey we trust (and supposedly know who the owner is) then we can also trust that exactly that person published it.

Yes, how I am supposed to trace the owner of a pub key 20-50 years from now. And . . . how would I find evidence to suggest their private key did not get misplaced.

The latter case also applies for any nanopublications we can find in the network, even those whose IDs we didn't trust beforehand.

And now because we have this "anchor" of trust, we can possibly find further information that is relevant for trusting other nanopublications, such as trusted people approving other people's nanopublications. These nanopublications can be introduction nanopubs where the creator declares their pubkeys. So, now we have found another person with pubkey who we can (probably) trust. This user in turn can approve further users, which we can then trust (at an even slightly lower level of trust probably) as well. And so on.

I can see how you can scan nanopubs for specific public keys. And . . . I can see how creators can say - hey, I just signed this introduction statement with my private key, so this shows that I have access to it. However, anyone can say whatever they want, including making introduction statement. Unless there's some third party ledger that keeps track of these introductions. . . sort of like the role that editors/reviewers play in the academic publishing world.

I am not sure that directly answers your question? But in any case, that's roughly the approach on trust we are taking with the nanopublication network.

I am still a bit fuzzy on how I'd be independently able to verify trust chains in the nanopub universe, because:

the verification of nanopub hashes/signatures needs a specialized tool that is not (yet?) generally available, not as generally available as tools like "sha256sum" or "md5sum"
citing/versioning of all, or a part of, the nanopub-verse - I imagine that the nanopubs come and go as projects spin up and wind down. How do you imagine nanopubs aging and getting carried across different platforms and beyond the internet?

Apologies for the sprawling questions . . . I am just trying to see how I can help construct long lasting and citable knowledge bases from nanopubs ("knowledge pixels") for existing infrastructures I contribute to (e.g., https://linker.bio , https://globalbioticinteractions.org).

knowledgepixels / nanopub-ecosystem

secure distribution of public keys to verify nanopublications #5