juju / charmstore

The charm store server.
http://gopkg.in/juju/charmstore.v5
GNU Affero General Public License v3.0
15 stars 37 forks source link

Charm resources should be de-duplicated as the charms are. #734

Open frankban opened 7 years ago

frankban commented 7 years ago

Moved from https://github.com/CanonicalLtd/jujucharms.com/issues/420:

I am writing a CI system that will publish both resources and charms to the charm store. Through my testing I have found that charms are de-duplicated but resources are not. The attaching or pushing resources ALWAYS increments the resource when a new one is uploaded. Since resources are mostly binary files, can we not de-duplicate them based on the hashsum of the file? Otherwise I fear the CI system will publish too many resources that are exactly the same file wasting space and setting up a skew where charm name is easyrsa-9 and resource is easyrsa-40960.

$ charm attach cs:~mbruzek/easyrsa easyrsa=./easyrsa-resource-3.0.1.tgz                                                                                                           
uploaded revision 12 of easyrsa
$ charm attach cs:~mbruzek/easyrsa easyrsa=./easyrsa-resource-3.0.1.tgz
uploaded revision 13 of easyrsa
$ charm attach cs:~mbruzek/easyrsa easyrsa=./easyrsa-resource-3.0.1.tgz
uploaded revision 14 of easyrsa

More data using the charm push command:

$ charm push ./easyrsa --resource easyrsa=easyrsa-resource-3.0.1.tgz                                                                                                              
url: cs:~mbruzek/easyrsa-3
channel: unpublished
Uploaded "/tmp/easyrsa/easyrsa-resource-3.0.1.tgz" as easyrsa-15
$ charm show cs:~mbruzek/easyrsa --channel unpublished id resources
id:
  Id: cs:~mbruzek/easyrsa-3
  Name: easyrsa
  Revision: 3
  User: mbruzek
resources:
- Description: "The release of the EasyRSA software you would like to use to create\ncertificate
    authority (CA) and other Public Key Infrastructure (PKI). \nThis charm was written
    using v3.0.1, so earlier versions of EasyRSA may \nnot work. You can find the
    releases of EasyRSA at \nhttps://github.com/OpenVPN/easy-rsa/releases\n"
  Fingerprint: zR3FHi3mikVQCj6pmwTuJxk3G4oBu2vdsIsQ82ktI3oj5F4eQm9oKO+z2rGlNJuj
  Name: easyrsa
  Path: easyrsa.tgz
  Revision: 15
  Size: 40960
  Type: file
$ charm push ./easyrsa --resource easyrsa=easyrsa-resource-3.0.1.tgz 
url: cs:~mbruzek/easyrsa-3
channel: unpublished
Uploaded "/tmp/easyrsa/easyrsa-resource-3.0.1.tgz" as easyrsa-16
$ charm show cs:~mbruzek/easyrsa --channel unpublished id resources
id:
  Id: cs:~mbruzek/easyrsa-3
  Name: easyrsa
  Revision: 3
  User: mbruzek
resources:
- Description: "The release of the EasyRSA software you would like to use to create\ncertificate
    authority (CA) and other Public Key Infrastructure (PKI). \nThis charm was written
    using v3.0.1, so earlier versions of EasyRSA may \nnot work. You can find the
    releases of EasyRSA at \nhttps://github.com/OpenVPN/easy-rsa/releases\n"
  Fingerprint: zR3FHi3mikVQCj6pmwTuJxk3G4oBu2vdsIsQ82ktI3oj5F4eQm9oKO+z2rGlNJuj
  Name: easyrsa
  Path: easyrsa.tgz
  Revision: 16
  Size: 40960
  Type: file

Please notice the version of the charm is not incrementing but the the revision of the resource ALWAYS increments, even when the Fingerprint is the same.

mbruzek commented 7 years ago

I didn't know that "charmstore" was the right component and opened this bug against the wrong project. Sorry about that! Please let me know if you need any more information from my end. I snap install charm and am using the following versions:

$ juju --version
2.2-alpha1-yakkety-amd64
$ which juju
/snap/bin/juju
$ charm version
charm 2.2.0
charm-tools 2.2.0
$ snap list
Name        Version  Rev   Developer  Notes
charm       2.2      11    charms     classic
conjure-up  2.2-dev  110   canonical  classic
core        16-2     1337  canonical  -
lxd         2.10.1   1463  canonical  -
jrwren commented 7 years ago

@mbruzek AFAIK, this was intentional and by design.

You could work around this by comparing the fingerprint of the uploaded resource to the one you are about to upload and not uploading it if they match.

The fingerprint is the SHA-384 of the resource data. Unfortunately the yaml outputs the SHA-384 as base64 encoded instead of hexadecimal. This conversion can be annoying, but sadly is the yaml spec. You can convert by piping the value to | base64 -D | xxd -ps -c 50

mbruzek commented 7 years ago

@jrwren Can you share the reason here or in PM? It seems that there could potentially be a lot of duplicate binary resources in the charm store in this way. Even if the store is not actually storing the file multiple times this is a usability issue as the resource number grows with multiple builds per day.

Do we really want to force CI systems to calculate this (undocumented) hashsum and do the work of deduplicating charm store resources? It is technically possible to script something like this up but but seems like a workaround that the store should manage. The next person trying to build a CI pipeline will not know how to come up with the hashsum and just blindly upload resource binaries to the charm store.

arosales commented 7 years ago

fwiw, the main thing I think @mbruzek was trying to call out is that in a CI system we may upload a binary such as: charm attach cs:~mbruzek/easyrsa easyrsa=./easyrsa-resource-3.0.1.tgz multiple times each day (resource binary is unchanged). Thus, the concern was that this would fill up disk space needlessly with the same binary.