readium / lcp-specs

🔐 Releases, drafts and schema for Readium LCP
https://readium.org/lcp-specs/
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Link resource hash is base64 encoded? #52

Open danielweck opened 2 years ago

danielweck commented 2 years ago

According to the specification and JSON schema, the answer is yes:

https://github.com/readium/lcp-specs/blob/9337294063680ac79b20006d9f32babd2a17e318/schema/link.schema.json#L42-L46

https://github.com/readium/lcp-specs/blob/master/releases/lcp/lcp-1-0-3.md#link-object

SHA-256 hash of the resource - Base 64 encoded octet sequence

However the LCP Go server implementation encodes the hash property as the hexadecimal string representation of the SHA256 digest:

{
  "rel": "publication",
  "href": "https://domain.com/contents/038fe68f-b4b9-4d87-a70f-59e843205359",
  "type": "application/epub+zip",
  "title": "accessible-epub3",
  "length": 4105975,
  "hash": "a9db6e6f45f83e1de50a00c0cbb033ed6c24daa3275537a29e05a46db312ec04"
}

Which is correct?

danielweck commented 2 years ago

Go source code reference: https://github.com/readium/readium-lcp-server/blob/626f49698d5dd10aa80aa012ee9d5fb146969e43/pack/pipeline.go#L164

    encryptedFileInfo.Sha256 = hex.EncodeToString(hasher.Sum(nil))
danielweck commented 2 years ago

Related issue: https://github.com/readium/lcp-specs/issues/51#issuecomment-948424976

...and important observation: the base64 encoding layer applies to the hex buffer, not to the hex string!

danielweck commented 2 years ago

Hello @mickael-menu I searched https://github.com/readium/r2-lcp-kotlin and https://github.com/readium/r2-lcp-swift but I could not find usages of the LCP license hash JSON property, i.e. to check resource integrity when downloading the linked publication, based on the SHA256 checksum expressed as a hex string representation of the digest buffer. I just wanted to double check with you directly, that if the Go server implementation is updated to generate a Base64 string instead of Hex, you client-side consumer APIs will continue to work.

mickael-menu commented 2 years ago

I don't think we ever used hash in the Kotlin and Swift implementations, I'll add an issue for this.

By the way the updated repos are https://github.com/readium/kotlin-toolkit and https://github.com/readium/swift-toolkit

mickael-menu commented 2 years ago

What's the purpose of base 64 encoding the hash in the JSON license? cc @llemeurfr @HadrienGardeur

danielweck commented 2 years ago

What's the purpose of base 64 encoding the hash in the JSON license?

Same reasons as lcp_hashed_passphrase?

https://github.com/readium/lcp-specs/issues/51#issuecomment-947839439

llemeurfr commented 2 years ago

The hex implementation in the Go server is from 2016 and indeed I don't know why it is using hex encoding and not base64. I didn't spot the issue before.

In the LCP spec, the link/hash, encryption/content_key/encrypted_value and signature/certificate + value are base64 encoded. This was a generic choice.

note that in Golang, JSON marshaling of byte arrays is using base64, this is why encrypted_value, certificate and valueare correctly serialized.