Verifying distributions downloaded from CPAN

tartansandal commented 9 years ago

It's not really safe to use a distribution downloaded from CPAN unless we:

download the authors CHECKSUMS file
verify that the authors CHECKSUMS file has been clear-signed with the PAUSE Batch Signing Key using the public key shipped with CPAN.pm
verify that the distribution tarball matches the checksums in the authors CHECKSUMS file
unpack the tarball and verify the distributions SIGNATURE file (if it exists)

This is what cpanm does with the '--verify' option.

Unfortunately we can't run 'cpanm --verify' against a pinto repository, because the CHECKSUMS file gets recreated by pinto and we lose all the signatures.

I'm thinking it would be good if pinto did the above when it downloads the distribution tarball to its temporary location.

Have spent some time scratching around in both the Pinto and cpanminus source and dont mind having a go at this if it seems appropriate.

thaljef commented 9 years ago

So if Pinto does steps 1 through 4 when it fetches libraries, would you then trust whatever comes out of your Pinto repository? Or would you also want Pinto to have its own keys and signatures for its clients?

tartansandal commented 9 years ago

Now you are making me even more paranoid ;-)

Verifying the fetched distributions makes me happier for a repository on my local machine.

Having Pinto manage keys and signing would be interesting (complicated), but then we would have to mange a web of trust. Thinking about how that would impact Stratopan. Arghhh.

Wondering if we can retain and use the signed upstream CHECKSUMS files somehow. I can see how we would need new CHECKSUMS for locally added distributions. Perhaps only signing those with your own signing key.

tartansandal commented 9 years ago

I've scratched together a preliminary implementation at

https://github.com/tartansandal/Pinto/tree/verify_dists

This is very loosely based on the cpanm code (once you untangle it).

No tests or documentation yet. I wanted to get some feedback on whether this approach fits with the Pinto design. ( Gorgeous code base by the way:-) )

Currently it throws an error if something is critically wrong, but quietly skips over verification of missing or unsigned CHECKSUMS files. Maybe it needs a '--force' option incorporated?

Ideally I'd like to have an 'audit' command that traverses a repository or stack, highlighting potentially problematic distributions. Perhaps subcommands to list distributions that:

don't have valid upstream CHECKSUMS files or,
have valid upstream CHECKSUMS that don't verify the distribution (say of you have overridden upstream with a local copy),
don't contain SIGNATURE files,
have SIGNATURE files that are invalid

Perhaps other subcommands to, list the gpg keys required to verify the stack, generate summary CPANTS reports, etc ...

Just rough thoughts at the moment on how we could better curate better quality repositories.

thaljef commented 9 years ago

I'm not familiar with the mechanics of module signatures, so I'll have to study that a bit. But it looks like you're on the right track. Keep in mind that the upstream repository is not always a proper CPAN mirror. Pinto could be fetching modules from another pinto repository, or a minicpan, or anything that happens to look like a CPAN. So it needs to have reasonable behavior in those situations too.

thaljef commented 9 years ago

FWIW, I have tried to design Pinto for a world where CPAN is not the center of the universe. I imagined whole networks of private Pinto repositories pulling various distributions from each other. To be honest, I don't know if I have achieved that goal or if my vision is even sane. But for now, I would like to continue entertaining those ideas, which may complicate or contradict your efforts to improve security.

tartansandal commented 9 years ago

I like the idea of CPAN not being the centre of the universe and I think the various ideals can coexists. We would only be verifying distributions if the upstream repository supported it. You would always be free to choose unsigned upstream repositories if you wanted (say on a trusted network or over TLS). If you have an upstream that does sign distributions, though, we should verify and block failures by default. Being able to audit a stack and then resign the distribution CHECKSUMS files with a corporate or team key would be a cool extension. Being able to say "these distributions have been audited and approved for use" would be cool in some environments.

tartansandal commented 9 years ago

Have a preliminary implementation of an 'audit' command at

https://github.com/tartansandal/Pinto/tree/audit

I've already used this to identify and fix some issues in one of my stacks. I still have some test to write, and have a number of TODO items noted in the 'audit' commands POD, but any feedback on the implementation and how to make it fit best with the rest of the code (Types, OO, exceptions, names, messages) would be most appreciated.

While enforcing verification is ideal, it does require the user to have some understanding of GPG and a bit of infrastructure in place. This is not always going to be the case, so I'm toying with the idea of a global '--verify' or '--strict' option to trigger and enforce the checks for those who know what they are doing and care. Still have to figure out how to propagate that option.

thaljef commented 9 years ago

I haven't looked at the code yet, but here are some thoughts:

There is a verify command that checks whether each distro named in the database actually exists on the file system. Would it make sense to add the "audit" functionality to that (instead of creating a separate command)?
There are a couple Roles that get applied to certain Action classes. Take a look in Pinto/Role for some ideas on how to propagate global options.
Be aware that configuration properties exists at multiple levels. First, there is the repository config file which sets the default properties for all new stacks. Then each stack can override those with its own properties via the props command. Finally, some properties can be controlled through options on each Action command.

I'll look at the code in more detail tomorrow. Thanks for taking this on.

tartansandal commented 9 years ago

Thanks for the feedback and for building Pinto in the first place. I've learned a lot working with the code base.

I've been jealously eyeing up the verify command and its namespace for a while. One problem is that the signature verification steps run by the audit command can generate a lot of warnings depending on the state of the keyring in use. This may or may not be a problem depending on current uses of the command, e.g., if Stratopan uses it, then that could quickly swamp the web server log files. If we hide or ignore these warnings, we are imposing a weak trust model on the user and I don't think that is a good idea. The more I think about it, the more I lean towards using a dedicated GPG keyring to let the end user define and manage their trust model.
It seems that the Puller Role might be a good target for this. Perhaps creating a Verifier Role to be consumed by both the Puller Role and the DistributionAudit Class. Or maybe not: need to figure out how to get the functionality into the Repository Class somehow.
Ok. I definitely have to get my head around props.

May not get a chance to look at this again until next week. Its been fun actually writing code instead of management reports :-)

tartansandal commented 9 years ago

I now have a fairly workable implementation at:

https://github.com/tartansandal/Pinto/tree/audit

Some notes on recent developments:

The verify command seems to be more about repository maintenance and ensuring the database is consistent with the file system, so I think we should leave that alone.
I've split out the checksum/signature verification methods in to Verifier class that can be used by both Repository and Auditor as appropriate.
The options --verify and --strict can now be passed to any action that consumes the Puller role and they get propagated to the corresponding Repository instance.
The environment variable PINTO_GNUPGHOME can be used to specify a dedicated keyring and trustdb for use with pinto (but only if you are using GnuPG).
Output from gpg and friends now play nice with Pinto::Chrome
Have added some basic tests for the Verifier class and for integration with Puller actions. More thorough testing would require either adding specially signed distributions to test against or pulling them off CPAN/Stratopan directly -- perhaps something for under ./xt/.

The current branch forms a relatively complete set of functionality. I can rebase this and submit a single-commit pull request for review if that suits.

What remains is to define and extend my rather vague notion of what an 'audit' is. Currently all it does is verify existing distribution checksums and signatures and generates a summary. Useful, but limited. A couple of things I'd like to do:

Extend the information gathered to include meta data from, say, meta-cpan. In particular, test results and ratings. This would be good for highlighting distributions (dragged in by dependencies) that may be problematic.
Record an audit so it can be queried. E.g. show me the distributions that fail by more than 10% of CPAN testers; show me the distributions that don't have embedded signatures; show me the distributions that don't have a rating or have a rating less than 3.
Add notes to an audit? Sign off a distribtution?
Compare different audits? Audit diff?
Add a peek command to unpack a distribution and shell into the top level.

tartansandal commented 9 years ago

I think I've just convinced myself that the 'Audit' functionality should be an independent plugin, since its probably going to involve changes to the DB schema which would impact existing users, but the 'Verifier' functionality would still be core.

Thoughts?

tartansandal commented 9 years ago

I think my ideas concerning ad hoc audits are running orthogonal to what Pinto is doing. I think they can be explored out-of-band:

pinto list --format "%D" | audittool

However, I think the cryptographic verification stuff is still useful in Pinto. Have stripped back my implementation and incorporated it into the existing verify command. Pull request soon.

Cheers,

Kal

thaljef commented 9 years ago

I think my ideas concerning ad hoc audits are running orthogonal to what Pinto is doing.

That's unfortunate. Do you want to get on Skype tomorrow morning (pacific time) and talk it over?

My handle is "thaljef".

thaljef commented 9 years ago

Add a peek command to unpack a distribution and shell into the top level.

:+1: Although I would call it look to match similar features in cpan and cpanm

tartansandal commented 9 years ago

On 29 January 2015 at 15:38, Jeffrey Ryan Thalhammer < notifications@github.com> wrote:

I think my ideas concerning ad hoc audits are running orthogonal to what Pinto is doing.

That's unfortunate. Do you want to get on Skype tomorrow morning (pacific time) and talk it over?

My handle is "thaljef".

Might be a little early for me :-) 10am in New Jersey is 5am in Melbourne. I see its past midnight for you now. Ouch!

I think the problem is that an audit is not easily defined, or even explained, since it depends on the security/quality policy that the individual/organization is operating under.

Some things are clear and can probably be performed with the existing Pinto infrastructure:

Each distribution in a 'target' stack needs to be reviewed and signed-off.
Distributions that are acceptable with respect to the policy can simply be registered with a 'good' stack.
Distributions that are problematic with respect to the policy can be registered with a 'bad' stack.
Audit details can be recorded in the commit comment.
We can track un-audited distributions with the diff command.
Distributions that don't verify, probably shouldn't be in your repository at all (unless its a special one dedicated to corrupted distributions).

Gathering information/summaries about a (set of) distribution(s)/package(s)/author(s) from MetaCPAN seemed like the orthogonal task. It would be good to have a tool that took an arbitrary set of distributions and told you something interesting about them, perhaps something that you could use in an audit. I've had a little play with the MetaCPAN API and ElasticSearch queries (brain hurts now) and have kind of come to the conclusion that there are a lot of potentially interesting queries which are going to be quite tricky to code. My thinking is that it might be easier to:

Start with an independent app like https://metacpan.org/pod/App::metacpansearch.
Modifying that to explore potentially useful searches.
Once we have a clearer idea of what we want, explore potentially useful reports.
Think about whether those reports make sense to run from Pinto.
Document why they are useful wrt potential policies (really, really hard).
Decide how much of our audit we really need to document/record in the Pinto database.
Decide if we need to maintain trails of audits ...

At a guess, I'm thinking the above could take a fair few months to get the above sorted out, and it would be handy (at least for me) to have a useful (developing) tool in the interim.

K

Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd

Suite 1416 401 Docklands Drive Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925

tartansandal commented 9 years ago

On 29 January 2015 at 17:06, Kahlil Hodgson kahlil.hodgson@dealmax.com.au wrote:

Might be a little early for me :-) 10am in New Jersey is 5am in Melbourne. I see its past midnight for you now. Ouch!

Meant 10am in San Francisco is 5am in Melbourne.

Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd

Suite 1416 401 Docklands Drive Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925

thaljef commented 9 years ago

Meant 10am in San Francisco is 5am in Melbourne.

We can do it later in the day too. I just won't be able to talk quite as long.

What time would work for you?

-Jeff

tartansandal commented 9 years ago

I can do anything between 10am and 1pm here, which for you is 3pm to 6pm. Whats the best time in that range for you?

Cheers,

Kal

Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd

Suite 1416 401 Docklands Drive Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925

On 29 January 2015 at 17:43, Jeffrey Ryan Thalhammer < notifications@github.com> wrote:

Meant 10am in San Francisco is 5am in Melbourne.

We can do it later in the day too. I just won't be able to talk quite as long.

What time would work for you?

-Jeff

— Reply to this email directly or view it on GitHub https://github.com/thaljef/Pinto/issues/177#issuecomment-71978059.

thaljef commented 9 years ago

On Wed, Jan 28, 2015 at 11:01 PM, Kal Hodgson notifications@github.com wrote:

I can do anything between 10am and 1pm here, which for you is 3pm to 6pm. Whats the best time in that range for you?

3:00pm (my time) will be fine.

-Jeff

tartansandal commented 9 years ago

Awesome.

I'll give you a call when I get into work. My handle is "Kahlil Hodgson"

Cheers,

Kal

Kahlil (Kal) Hodgson GPG: C9A02289 Head of Technology (m) +61 (0) 4 2573 0382 DealMax Pty Ltd

Suite 1416 401 Docklands Drive Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use a hammer." -- IBM maintenance manual, 1925

On 30 January 2015 at 05:02, Jeffrey Ryan Thalhammer < notifications@github.com> wrote:

On Wed, Jan 28, 2015 at 11:01 PM, Kal Hodgson notifications@github.com wrote:

I can do anything between 10am and 1pm here, which for you is 3pm to 6pm. Whats the best time in that range for you?

3:00pm (my time) will be fine.

-Jeff

— Reply to this email directly or view it on GitHub https://github.com/thaljef/Pinto/issues/177#issuecomment-72072924.

thaljef / Pinto

Verifying distributions downloaded from CPAN #177