torch / cutorch

A CUDA backend for Torch7
Other
338 stars 208 forks source link

No autobuild on cutorch branch? #390

Closed hughperkins closed 8 years ago

hughperkins commented 8 years ago

Seems like it'd be good to have an autobuild on cutorch branch? having a dedicated titan or ec2 g2 instance seems overkill :-D but you could set up an ec2 instance to start whenever there is a commit, connect to a jenkins, have the jenkins already have queued a build job for it, and shut down the instance after the test, for example.

This would have a couple of advantages:

soumith commented 8 years ago

this is a good idea and on my list forever, just haven't had the time to figure out how to do all of this....

hughperkins commented 8 years ago

I guess a key prerequisite is to have a suitable machine or ec2 iamuser. I have a bunch of scripts around to do things like, start an existing ec2 instance, and so on.

hughperkins commented 8 years ago

... and so, if you provide me an appropriate iamuser account/keys and/or we create some sort of private/shared repo for jenkins scripts etc, I might help with this. Maybe. Possibly. No guarantees. But maybe :-)

hughperkins commented 8 years ago

You know, nimbix instances make doing this really easy. Since:

soumith commented 8 years ago

If you set it up via your personal account, i can send you money for this every month

soumith commented 8 years ago

I just dont know how to setup the amazon thing. Another option is we do a quick screenshare and i can setup the amazon thing with my billing that you can use for this...

hughperkins commented 8 years ago

Ok. I think we basically need three parts:

One conundrum which occurs to me: you and I have both set up our travis tests only on our distro repos, not on the actual cutorch/cunn/cltorch/clnn repos. Since the repos cant actually be tested in isolation. Personally I actually update my distro-cl for all changes to the underlying repos https://github.com/hughperkins/distro-cl/commits/distro-cl , but that's not really how it works for cunn currently. So ... thoughts on this?

soumith commented 8 years ago

@hughperkins

The costs look pretty good, no worry there.

I've setup travis tests not just on distro, but on individual nn, torch7, image, nngraph etc.

We follow the same for cutorch/cunn?

szagoruyko commented 8 years ago

this is great. I have travis config for cutorch https://github.com/szagoruyko/cutorch/blob/master/.travis.yml, cunn is trivial to modify and add actual GPU tests

hughperkins commented 8 years ago

First part: create ec2 t2.micro instance: https://github.com/hughperkins/torchunit/blob/master/ec2instance.md

hughperkins commented 8 years ago

nimbix https://github.com/hughperkins/torchunit/blob/master/nimbixsignup.md

But as I wrote it, I realized that I dont really want to have your apikey, on the whole. Too dangerous if it somehow leaks out.

So, as I was writing, I rethought a bit, and think maybe no-one except you should have login to the jenkins box, and jenkins box will run from scripts in a github repo. You can put your nimbix apikey on that box, noone else can see it. We are free to modify the jenkins instnace config by pushing pull request to the repo containing the jenkins instance installatin/configuration scripts. Then your api key is safe.

hughperkins commented 8 years ago

The scripts at https://github.com/hughperkins/torchunit are enough to install and start jenkins now, including creating a self-signed https cert :-)

To install:

wget https://raw.githubusercontent.com/hughperkins/torchunit/master/installjenkins.sh
bash installjenkins.sh

To run:

torchunit/runjenkins.sh

To access the (entirely unsecured for now ... ) jenkins, navigate to:

https://52.1.2.3:8443/

.... where 52.1.2.3 is the address of the ec2 box

hughperkins commented 8 years ago

(you'd probably want to fork my repo, so you can control pull requests to it, I would imagine :-) )

hughperkins commented 8 years ago

hmmm, tweaking some stuff...

hughperkins commented 8 years ago

Enabled security now. Just some basic security so by default people only have read-only access, which sounds pretty good for now. The install procedure has changed:

in the instance ssh, paste and run:

wget https://raw.githubusercontent.com/hughperkins/torchunit/master/bootstrap.sh
bash bootstrap.sh

2. copy torchunit/config.yaml.templ to torchunit/config.yaml , and set a jenkins user password in it

3.

bash torchunit/installjenkins.sh

Run procedure unchanged, just do, from the intsance:

bash torchunit/runjenkins.sh

So far, no jobs or anything, but at lesat its secured. There are ways and means of creating jobs, ie/eg http://docs.openstack.org/infra/system-config/jjb.html

Example jjb jobs:

https://github.com/hughperkins/DeepCL/blob/master/jenkins/jobs.yaml

hughperkins commented 8 years ago

Actually... it occurs to me, can we just directly drive nimbix from travis? I guess we can ???

hughperkins commented 8 years ago

Seems like if we put the nimbix apikey in travis repo settings, it would only be available to pull requests submitted by eg soumith. other pull requests wouldnt see the apikey. which makes sense. but limits the usefulness a bunch...

https://docs.travis-ci.com/user/environment-variables/

"Encrypted environment variables are not available to pull requests from forks due to the security risk of exposing such information to unknown code."

So, a webservice, on a trusted machine, is perhaps the way to go. Might not need to be a full jenkins thing though. Could just be a service that:

hughperkins commented 8 years ago

Apparently we can detect travis boxes as https://docs.travis-ci.com/user/ip-addresses/ Whether it's worth doing this, since anyone can run any build on travis is an open questoin. The real security comes from the fact that the secure webservice wont execute arbitrarily code, simply pull requests of cutorch. Which is not theoretically non-arbitrary in itself. Maybe what we want is something like that magical 'test this please' phrase I saw somewhere, where if a nominated repo admin said "Test this please", the build bot would run :-O :-)

hughperkins commented 8 years ago

(seems there is a jenkins plugin for the 'test this please' functionality https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin )

hughperkins commented 8 years ago

Seems it is possible to raise a support request to fund a nimbix account via Paypal, prior to using it. This would then presumably limit the upper bound cost of some runaway usage to however much one funded it for. They promise to terminate all instances once such funding has been used up, though I suspect this is kind of a manual process for now, on their part.

hughperkins commented 8 years ago

(I've made a systray icon for nimbix, that shows any instances we have started https://github.com/hughperkins/nimbix-admin/blob/master/ubuntuindicator.py This means that if someone does somehow get the apikey, and start a zillion intsances, we'd notice pretty quickly. I built this after leaving one of the dual titan instances running overnight :-D )

hughperkins commented 8 years ago

Ooo... I found an example of someone using 'test this please'. It is ... Tensorflow :-P https://github.com/tensorflow/tensorflow/pull/2556#issuecomment-222893778

Edit: hmmmm, I wish github would warn me it's going to add a link, or ask me if I even want a link ... :-/

soumith commented 8 years ago

Okay, off to a good start,

$JENKINS_IP = "50.17.86.9"

soumith commented 8 years ago

@hughperkins installed jenkins, and gave access to the box to your private keys (that you sent to me)

soumith commented 8 years ago

next step, i'm signing up for nimbix.

soumith commented 8 years ago

i've signed up for nimbix, added you to the "Team" (invited your gmail address), and added in my payment details. I tried to follow you on the instrunctions at the end of the Nimbix signup document, but you list two possibilities there, so got confused. What next? What else do I need to do?

hughperkins commented 8 years ago

Cool! :-) Started looking at next steps in https://github.com/hughperkins/torchunit/tree/master/jenkins.md

hughperkins commented 8 years ago

As far as the Nimbix API key, I'm not sure I'd be entirely comfortable with having my own nimbix apikey on a shared server, therefore by extension, I wouldnt be entirely comfortable with putting your own key, or a proxy of such (eg via Nimbix team structure), on a shared server.

I think we should have a wrapper webservice around the nimbix api, on a standalone ec2 instance, which wraps the nimbix api, and does things like:

I can look at writing such a wrapper service, probably in Flask, and add it in eg into torchunit repo, (or possibly into some generic repo 'nimbix-wrapper')

soumith commented 8 years ago

no need to write a wrapper service and stuff. Just put my API key on the jenkins server, it's fine. Only me and you are sharing it, and nimbix has a cap on my account.

hughperkins commented 8 years ago

Only me and you are sharing it

Well, for now. But would probably be good if other people can admin it too, I would think. I have written a wrapper at https://github.com/hughperkins/nimbix-admin/blob/master/wrapper-service/nimbix-wrapper.py It has all the security detailed above, and in addition is locked to the ip address of the jenkins box (specified in https://github.com/hughperkins/nimbix-admin/blob/master/wrapper-service/config.yaml.templ ) I'll provide some instructions/script for setting this up, soonish.

hughperkins commented 8 years ago

(Also, the attack surface of an instance containing a small tiny lightweight surface is much smaller than a big jenkins box, I would think)

hughperkins commented 8 years ago

(basically, the situation I want to avoid is:

Easiest way is, I simply dont have the apikey, then I dont have to handle this situation as and when it arises...)

soumith commented 8 years ago

no problem, i just didn't want you to do extra work when it was mostly a hypothetical. I mean, I trust you to not care whether you have an API key or not. will start taking a look at it wednesday-ish, at this day-long conference today.

apaszke commented 8 years ago

@hughperkins btw, if you really want to be secure you probably need to make it a challenge-response authentication, rather than a single request. If someone sniffs the shared key, they can simply spoof packets as coming from a Jenkins box IP. They don't need your response to spin up jobs, so the IP check doesn't do much right now.

Even simpler, you could make it work more like CSRF. First Jenkins requests a temporary security token (+ it has IP checked), and then it submits a work request, that has to contain the exact same token, which is invalidated immediately. No shared secret required.

Adding https://letsencrypt.org to the service can minimise the chance someone will see the secret/token in transit. I can help if you need any assistance.

hughperkins commented 8 years ago

they can simply spoof packets as coming from a Jenkins box IP

I think they'd struggle to spoof a tcp/ip connection? They'd get as far as sending 'SYN', server sends 'ACK', and then ... ? Unless they control a router in between.

Adding https://letsencrypt.org to the service can minimise the chance someone will see the secret/token in transit

Yes. Actually, I wrote a 'howto' for this, targeted at jenkins usage, for those like me who dont fancy letting lets-encrypt install libraries all over the place https://github.com/hughperkins/howto-jenkins-ssl/blob/master/letsencrypt.md I've never dabbled in enabling it for flask though.

I can help if you need any assistance.

Sure, sounds good :-)

apaszke commented 8 years ago

As far as I remember you don't have to install much. You can obtain the certificates through https://gethttpsforfree.com. I didn't check its code, but it's open source and is listed on let's encrypt website, so it's probably ok.

For the flask part, it seems to be quite easy too: http://flask.pocoo.org/snippets/111/

apaszke commented 8 years ago

On second thought, it's not a public service, so letsencrypt is an overkill. Self-signed certificate should be ok.

hughperkins commented 8 years ago

On second thought, it's not a public service, so letsencrypt is an overkill. Self-signed certificate should be ok.

Ok. Yes, and shared secret just lets them schedule a set of cutorch unit tests anyway. I'm sure that will be fun for them :-D Maybe I'll draw a picture of the architecture I'm imagining.

hughperkins commented 8 years ago

https://docs.google.com/presentation/d/1Jiddfqg3yko-LLE3oURfB-yEOdOFi3mDI4Hp9v5B0sk/edit?usp=sharing

hughperkins commented 8 years ago

cutorch ci

apaszke commented 8 years ago

Thanks! You did describe it clearly earlier, but that's a nice overview.

I just meant that people who can see the shared secret are also the ones that probably would be able to do some fun IP spoofing, and adding self-signed certificate to flask should be quite easy.

Nevertheless, yeah, it would only let them burn some money on unit-tests (or on arbitrary stuff, if they send a PR that completely changes the tests, and do "Test this please" with their own triggers).

hughperkins commented 8 years ago

Nevertheless, yeah, it would only let them burn some money on unit-tests (or on arbitrary stuff, if they send a PR that completely changes the tests, and do "Test this please" with their own triggers).

On the subject of Jenkins, 'test this' plugin needs 'push' access to the repo. This sounds quite powerful to me, so I might check to what extent it is subject to checks such as, cannot push to 'master', and cannot force push to master.

(Original message, can skip too long:

Nevertheless, yeah, it would only let them burn some money on unit-tests (or on arbitrary stuff, if they send a PR that completely changes the tests, and do "Test this please" with their own triggers).

You mean, if someone has access to the Jenkins? So, in related news, yesterday, when I started looking at putting in place 'test this please', it turns out you need to grant 'push' access to jenkins, in order to enable this. I dont use this module on any of my own jenkins currently. What I tend to do to date is:

The 'test this please' I think basically automates this second step, as far as I can tell, though I confess I havent looked in detail yet. It does however imply giving 'push' access to Jenkins. Thinking about what is possible with such access, I came up with:

...might not be noticed immediately.

... however... since it's using a separate github account, presumably subject to github standard repo permissions, maybe:

hughperkins commented 8 years ago

step 1 of checking limits of limits on possible future torchbuildbot account:

Next up: need to confirm that said permissions are sufficient for 'test this' to work ok

hughperkins commented 8 years ago

Hmmm, cant figure out how to configure the webhook to connect from github into jenkins pull request biulder plugin. I just get:

webhookfail

... all the time. It seems that most jenkins-side stuff assumes one has a github account with admin access, that one can put into jenkins, and it will set up the hook for you. So, documentation on setting up the webhook in github is pretty sparse. What I tried (on my own jenkins/github repo), is to put a shared secret in the jenkins system configuration at:

jenkinssystemconfig

... and use the same secret value in the github webhook configuration

I've been poking at this for an hour or so, cant seem to figure out how to get this working?

It's possible to set the jenkins build to poll periodically, eg every 20 minutes, but that'd be a bit annoying to use probably?

hughperkins commented 8 years ago

Raised an issue at https://github.com/jenkinsci/ghprb-plugin/issues/378

hughperkins commented 8 years ago

Ah, figured out the webhook bit, thanks to http://stackoverflow.com/questions/7427557/jenkins-and-github-webhook-http-403/7431548#7431548

successfulhook

hughperkins commented 8 years ago

Whee, I got test this please working :-) (on a test repo/jenkins). Bunch of hoops to jump through, I'll document them soonish...

hughperkins commented 8 years ago

(And note that the 'test this please' plugin also handles setting status on the pull request, and so on; seems quite fun :-) )

hughperkins commented 8 years ago

(Hmmm, wow, thats scary, I was dabbling in providing anonymous access to the build logs of my own jenkins, and noticed that all my enviornment variables are exposed :-P seems like launching jenkins with env -i might be a good idea... https://github.com/hughperkins/torchunit/commit/4d7561a04f8a3a69f7ea3ebb92945f988b0d3718 ; also, EnvInject plugin is plausibly not ideal on internet-facing jenkins... )

hughperkins commented 8 years ago

So, I experimented with removing 'push' access to the bot (on my test repo/bot/jenkins)

Without push access:

withoutpush2

With push access:

withpush3

I think that having the PR build status updated is a pretty fundamental really useful aspect of having autobuild, so I think the bot should have push access to the repository, just make sure that at least 'master' is protected, and cannot be force pushed to.

Thoughts?