Mierdin commented 8 years ago

Proposal: A Comprehensive Continuous Integration Toolchain for NAPALM

Authors:

Created: April 2016 Last updated: April 2016

Abstract

This proposal discusses the contribution of continuous integration tools, scripts, and configuration files to the NAPALM project by the network software team at eBay. It is our desire that the community provides feedback on this proposal prior to these contributions.

As a project, NAPALM is aimed at addressing a very critical problem in network automation: building software to control heterogenous network infrastructure (all with their own unique and "interesting" programmability options) is a complicated matter. A layer of simplification on top of these diverse APIs (or worse in some cases) is very clearly needed to ensure quality in network automation software, and continued network uptime.

Such a critical piece of software needs to have tooling in place that allows any developer to know if the code they're planning to contribute is going to help or hurt the project. eBay's network software team has made use of many of the tools presently used within the OpenStack community to help ensure that their software conforms to quality standards.

Background

eBay's Involvement in OpenStack

It is well-known that eBay is heavily involved with the OpenStack community, and uses the project for numerous internal infrastructure initiatives. As a result, leveraging the available continuous integration tools and configurations present in many of the OpenStack member projects was a no-brainer when it came time to start building network automation software.

Leveraging OpenStack's CI tooling and scripts meant not spending time figuring out how to enforce things like PEP8 compliance, or unit test structure/coverage. These tools were already set up, and at most, some configuration files may need to be changed. This generates immediate value, through enforcement of quality software, without too much effort.

From this process, eBay's network software team has used these tools and configurations for all Python-based software projects (not necessarily related to OpenStack) to ensure a codebase that does not compromise on quality, even with a diverse team of developers.

Contribution of Continuous Integration Toolchain/Configuration to NAPALM

This document proposes that these tools should be leveraged by the NAPALM project, and all code (present and future) is changed where needed to conform to the rules enforced by the CI tools.

The specifics of what rules are ignored/enforced, etc. is beyond the scope of this proposal. Such a discussion can and should be had when a PR is opened, and all can see the configuration files in question.

Central to this CI toolchain is Tox. Tox is used as the central point of logic for all unit testing, tempest/integration testing, code style (including but not limited to PEP8) and more. Tox provides a mechanism to automate continuous integration efforts, all within Python virtualenv sandboxes (which is also automated).

As mentioned before, this tooling is very similar to what you would see in an OpenStack project (see Neutron for an example), but here's a brief summary of what would be added:

tox
pylint
flake8
coverage.py
testr
miscellaneous sanity check shell scripts

It is crucial that these tools are automatically used and enforced when a pull request is opened, in order to ensure they continue to provide value. So, we propose that a build step is added to the build server (TravisCI, etc) that runs the following Tox environments, and requires that they pass before a PR is considered:

tox -epep8 (checks for style, including but not limited to PEP8)
tox -epy27 (runs unit tests)
tox -ecover (runs coverage testing, using coverage.py)

All of these changes would be performed on a single NAPALM driver repository initially, as a proof of concept. Once all of the wrinkles are ironed out, the changes would be replicated across the entire project.

Contribution of Code Style Fixes to NAPALM

In integrating the aforementioned tools into NAPALM, eBay's network software team also made some modifications to the NAPALM code itself, in order to conform to the rules enforced by flake8, pylint, and more. These particular changes were not functional in nature, and affected only the aesthetics and readability of the code. All efforts were made to preserve the existing functionality of NAPALM during these changes.

Since using these CI tools would mean that the existing codebase would fail to run Tox successfully until these changes are replicated on the open source NAPALM project, this document also proposes that these style improvements are also contributed - initially to the single driver being used as a PoC, but then elsewhere, where such improvements were performed.

Summary of Proposed Changes

If this proposal is generally favored, here is a list of changes that we are willing to contribute to this effort.

OpenStack has a well-understood toolchain for CI operations, and this proposal discusses adding these to NAPALM, and enforcing their use through TravisCI (or whatever CI server is used in the future)
All CI operations will be managed by Tox via a "tox.ini" configuration file that will be contributed in the initial PR
Tox becomes the central point for all CI operations, including PEP8, unit testing, and more (it is run by TravisCI automatically) All pull requests must pass these tests before being reviewed and accepted into the project:
- tox -epep8
- tox -epy27
- tox -ecover
Various other scripts and configuration files typically found in an OpenStack project (i.e. Neutron) will also be contributed, for additional functionality in the CI process. These include but are not limited to:
- .coveragerc
- .pylintrc
- .testr.conf
- "tools" directory (additional sanity check scripts, etc)
All changes will be performed in a PR for just one NAPALM device driver as a PoC, then rolled out to the rest of the project once the idea is proven
The CI process will require that the codebase pass Tox tests before showing a successful "build", so minimally, PEP8 fixes will have to be made before a PR is accepted. eBay has some of these PEP8 fixes in place and will be part of the initial contribution where possible.

jzohrab commented 8 years ago

Thanks for the detailed write up and effort. Some questions:

does this PR take into account the recent restructuring of the Napalm code base?
how will devs be impacted, day-to-day? I've written some "quickstart" docs (currently in a feature branch on this repo, ref https://github.com/napalm-automation/napalm/pull/239 for latest thoughts), and can contribute to that effort if needed
is there a way to do dev and testing locally? This testing effort looks like it's offloading to an external system. That's good and necessary, but day-to-day I prefer working locally prior to pushing to the world.
will there be a clear way for style issues to be identified and ticketed? I'd like to contribute to clear those up, but it's sometimes hard to tell who is contributing to what, and it's easy for things to conflict.

Thanks again, and regards.

Mierdin commented 8 years ago

Thanks @jzohrab - replies below:

Yes - All changes will be performed in a PR for just one NAPALM device driver as a PoC, then rolled out to the rest of the project once the idea is proven. This means that we're probably going to start with napalm-junos and once the PR is accepted, we'll replicate the changes in other drivers.
Devs will need to make sure their code is compliant with what the tooling presents. It's really easy to do this - you can run commands like tox -epep8 for PEP8 checks or tox -epy27 for unit testing, etc. These commands can be run on the dev's laptop, and these should be the exact same commands run by the CI build server (Travis CI)
See previous point - devs should absolutely run Tox locally to make sure things pass before opening a PR
Yes, the tooling like Flake8 and Pylint do a good job of describing where the violations are - and my thought is that this work can be done in a branch until the issues are fixed. I'll leave it to @dbarrosop et al to decide how best to communicate those issues to the community.

Thanks for your feedback! Hopefully I answered your questions.

jzohrab commented 8 years ago

"Devs will need to make sure their code is compliant with what the tooling presents. It's really easy to do this" - to simplify onboarding etc, can you make sure that the necessary tools are present in requirements.txt and/or setup.py, and that at least one sentence is added to the developer readme? I think that David wanted to change some things with the development stream ... happy to help work out the kinks on that.

Thanks @Mierdin, will be really interesting to see this in action, a full suite of device tests would be great.

dbarrosop commented 8 years ago

@Mierdin I really like the proposal. Can't wait to see the PR. Although I am not looking forward to the process of cleaning the code so flake8 and pylint are happy ;) I think that it would be easier and less painful if people would be willing to joing for a 'hackathon' dat when we focus on putting all the tooling together and cleaning the code. The branch alternative would work as well but might lead to a lot of conflicts and it might delay putting the tooling in place and I think the sooner we deal with this the better.

@jzohrab don't worry, we will have a dev-requirements.txt or something like that so all the tools are easily installed with pip. It's better to have a separate file as you don't need those development tools in production. Once we have clear understanding of how all the new tools and everything works we can work on a 'guidelines' file for contributing to the project and an 'on-boarding' file for getting started. Something tells me you want to take care of the latter ;)

Mierdin commented 8 years ago

@jzohrab So, since we're using Tox, the dependencies are very easy (since Tox is basically virtualenv automation). We have a requirements.txt file which is automatically installed into each Tox virtualenv, but we also have a test-requirements.txt file that contains all of the test-specific dependencies. As @dbarrosop points out, this is only for dev, so you don't use these dependencies for actually running the software in production. Whenever you do PEP8 checks with tox -epep8 or unit tests with tox -epy27 Tox will automatically install dependencies from both files to ensure it has everything it needs. Since this is all within virtualenvs managed by Tox, it's fairly self-contained.

@dbarrosop We've done as much of those fixes as we were able to do given our other priorities. As much as possible, our PR will contain those fixes. We'll start with the junos driver as our POC for all this stuff and see how it goes. A hackathon might be a great idea for the rest.

dbarrosop commented 8 years ago

Hi! Any update on this?

Mierdin commented 8 years ago

Not yet. Lots of post-Interop catch up. :) Still hoping to get this done soon. Maybe this week if I can find a blank spot.

Will keep you all posted

Mierdin commented 8 years ago

Just a quick update - I am really only fighting a single issue that's standing in the way of me publishing these changes (have been at this stage for a few weeks actually).

In short, something about Tox (not even our specific Tox configuration - as I've tested with a very minimal tox configuration to produce these results as well) is causing one of the nosetests to fail in a really weird way. In this test, pyez is returning the mocked data correctly, but the list of tuples being returned is not in the right order, which really screws with the function being tested. If I run the same test directly with nose, instead of through Tox, the tuples are in the right order, and the unit test passes.

Rather than submit something that doesn't work, I've been spending a lot of time troubleshooting this, without success thus far. I have a lot of other deliverables to worry about, so I've had to withdraw my attention from this from time to time. However, at this point I am considering opening a PR anyways, to get more eyes on the issue. I will let you all know if I get to this point.

Hopefully that makes sense - if you would like to hear more details, let me know.

dbarrosop commented 8 years ago

Could you post the issue? Depending on what it is maybe I can slightly change it to fix it.

napalm-automation / napalm