kindredgroup / puppet-forge-server

Private Puppet forge server supports local files and both v1 and v3 API proxies
69 stars 44 forks source link

Proxy fixes to prevent doubling of URL paths #24

Closed hickey closed 9 years ago

hickey commented 9 years ago

Encountered an issue with the proxy code where part of the download path as getting duplicated and failing when trying to download from Puppet Forge. If the file was in cache, then everything worked as expected. This was only when a new module was being pulled. I believe that this problem only occurred when a specific version was being pulled. If a version was not being specified (i.e. pull the latest version) then the code would work fine.

The original code would read the metadata from Puppet Forge and would use the file_uri attribute which would already have /v3/files added to the path. Then in api/v3/release.rb the path would receive another /v3/files added, thus duplicating /v3/files. The result would be that when the file was attempted to be retrieved from Puppet Forge, the file would not be found because one of the /v3/files was being left on the filename causing a 404 to be returned.

With this new code, v1 and v3 proxy code will strip off the version specific path from the filename and only add the version specific path back on when the file is attempting to be downloaded.

i11 commented 9 years ago

Hi,

Thanks a lot for the pull request! Interesting find. Let me try to re-produce the problem you've encountered. Do I understand correctly, that while forge is running new module is added at the proxied location and as soon as one tries to retrieve it (through puppet module install author-modulename -v given_version or librarian) it fails? Is that correct?

i11 commented 9 years ago

https://github.com/unibet/puppet-forge-server/commit/759504ba7ab95f0931079a5e7594e1e3c3eb13cd is addressing possible 404 issues. Please let me know if it helps.

hickey commented 9 years ago

I was just looking to see if I still had any of the diagnostic output that I was generating last night to track this down still in my tmux session. Unfortunately it is all gone. What was being requested from Puppet Forge was literally https://forgeapi.puppetlabs.com/v3/files/v3/files/puppetlabs-stdlib-x.x.x.tar.gz.

I can not see where 759504b could be an issue as this was not a case problem, but one of adding the HTTP route onto the value that was read from a JSON response that already had the HTTP route added to it.

i11 commented 9 years ago

The thing is that double /v3/files you see in the URLs is expected behavior. First /v3/files part of the URL is obviously server's API termination and the next /v3/files is the proxied API end point. The first part is dropped in the app classes (e.g. https://github.com/unibet/puppet-forge-server/blob/master/lib/puppet_forge_server/app/version3.rb) and only the rest of the URL is used to make a call. Naturally it shouldn't do /v3/files/v3/files towards forgeapi, so it would be great if I could reproduce and dig into it.

The reason I've refereed to https://github.com/unibet/puppet-forge-server/commit/759504ba7ab95f0931079a5e7594e1e3c3eb13cd is that it addresses 404s with capitals in module names and given the uncertainty about the circumstances the problem occurred I though, maybe it could help here as well.

Could you please provide a bit more details about the setup you have? Server version, command you start the server with, OS, ruby version, etc. I could try to catch it.

i11 commented 9 years ago

Just though of a scenario where you could see double /v3/files in the call towards forgeapi. In case you have forge server proxying forgeapi.puppetlabs.com together with another forge server that in its turn also proxies forgeapi.

forge-server -> forgeapi.puppetlabs.com
            |
             -> forge-server -> forgeapi.puppetlabs.com

There first forge server receiving double /v3/files in the json payload from the second forge server will attempt to request it from the forgeapi it proxies. The request will certainly fail and forgeapi will be just skipped in that particular sequence, instead using the second forge-server to retrieve the module file.

Probably you don't do this, but worth mentioning just in case.

hickey commented 9 years ago

No, nothing that complex. I have just a single puppet-forge-server instance running that then goes through a squid proxy to forgeapi.puppetlabs.com.

I have the puppet-forge-server running in a Docker container, so I can share that if necessary. Here is the Docker file where you can generate your own container.

# Puppet forge server
#
# VERSION               1.0.1
# DOCKER-VERSION        1.6.0

FROM       ubuntu:14.04
MAINTAINER Gerard Hickey <gerard.hickey@audiencescience.com>

VOLUME ["/puppet-modules"]
ENV MODULE_DIR /puppet-modules

# make sure the package repository is up to date
RUN echo "deb http://archive.ubuntu.com/ubuntu trusty-backports universe" >> /etc/apt/sources.list
RUN apt-get update

# Install ruby and puppet-forge-server gem
RUN apt-get -y install ruby2.0 ruby-dev ruby2.0-dev build-essential
RUN gem install puppet-forge-server

# Add supervisord support
RUN apt-get install -y supervisor
RUN mkdir -p /var/log/supervisor
RUN update-rc.d supervisor enable

# Add the supervisord component file
ADD . /src
RUN cp /src/puppet-forge.conf /etc/supervisor/conf.d

EXPOSE 8080
WORKDIR /puppet-modules
CMD /usr/bin/supervisord

I just realized that I need to regenerate his Docker file. Found that the ruby2.0-dev did not have the support for mkmf to build one of the supporting modules. Currently everything is running off of ruby 1.9.3 which looks like it is getting pulled in with the ruby-dev package. So the ruby 2.0 stuff can probably go.

hickey commented 9 years ago

Today I had to restart the docker container running puppet-forge-server and I had not committed the changes that I proposed here. So the container is running the original code from the repo.

Everything seems to be working right now. I will have to try to recreate the issue and then document the process to get the failures. Ideally I should find the issue pop up again in the next couple of days and will be able to provide more information and the steps to recreate it.

i11 commented 9 years ago

I haven't had time to try your docker setup yet. I'll try to run some testing during the weekend as well. Thanks for your help!

hickey commented 9 years ago

ARG... now I have the opposite problem.... With your existing code, there are spots that you are not putting any directory information on the download request. Specifically I call to your attention line 40 of backends/proxy.rb. Here is some of the diagnostic output that I have generated.

[2015-07-06 20:34:25] INFO  captures = "evenup-kibana-2.1.0.tar.gz"
[2015-07-06 20:34:25] INFO  download(/evenup-kibana-2.1.0.tar.gz)
[2015-07-06 20:34:25] ERROR  PuppetForgeServer::Backends::ProxyV3 failed downloading file 'evenup-kibana-2.1.0.tar.gz'
[2015-07-06 20:34:25] ERROR  Error: 404 Not Found
[2015-07-06 20:34:25] INFO  buffer = nil
10.13.8.43 - - [06/Jul/2015 20:34:25] "GET /v3/files/evenup-kibana-2.1.0.tar.gz HTTP/1.1" 404 28 0.1956
10.13.8.43 - - [06/Jul/2015:20:34:25 UTC] "GET /v3/files/evenup-kibana-2.1.0.tar.gz HTTP/1.1" 404 28
- -> /v3/files/evenup-kibana-2.1.0.tar.gz

It seems that in your code some places you are expecting a full path and other places you are expected just the filename. These are sort of mutually exclusive and should not be mixed. Effectively everything should be normalized down to just the filename.

Give me a couple of days to go through the code and canonicalize all the path references. This activity will probably make most of this PR moot.

i11 commented 9 years ago

Well it really depends on the context. There are some assumptions about the relative path content, but I don't think it's contradictory in any way. I should probably work a bit on the debug/error messages produced by the code to clarify certain behavioral aspects.

It's important to note that the URL value that comes after /v3/files/ is passed to the backed as-is and it's up to the backend to interpret it. In such way ensuring logical abstraction and isolation of the code. The confusing part here is probably the fault tolerance. Instead of creating complex parsing or verification logic, backend is just allowed to fail basically saying: "nope, I don't have it" and the app classes (or any other higher abstraction class) just go on to the next backend.

Putting it in a practical scenario, say that server was started with proxing remote v3 location and local directory path. Call for evenup-kibana-2.1.0.tar.gz module file, that's located on remote v3 proxy will look like /v3/files/v3/files/evenup-kibana-2.1.0.tar.gz after passing through Version3 app class relative path of /v3/files/evenup-kibana-2.1.0.tar.gz will be sent to the backends. Obviously Directory backend will fail, because it won't find such file, even if the module was cached. It just won't correspond to a file-path structure Directory backend operates on. But hopefully proxied remote location didn't go down and it will successfully retrieve and pass on the module file. Actually it will be a bit different after 1.6.0, as file url in releases json payload will contain cached files instead of remote ones making v3 proxy backend possibly fail and serving files directly from the cache. Possibly because it won't fail most of the time as backends are sorted according their priority and Directory backend has the highest one, meaning file buffer will be acquired and returned to the app classes from it. But let's leave it as it's less important for this discussion.

Now let's call for my-module-0.0.1.tar.gz module file that is stored locally. So the request URL will look like /v3/files/my-module-0.0.1.tar.gz and the relative path passed to the backends will be my-module-0.0.1.tar.gz. It will surely fail with the remote v3 proxy getting 404 from it, but hopefully file didn't disappear from the local file system and it will be recursively searched, successfully found and passed on.

Now for me to look closer at 404 you've got there, could you please provide the arguments that were used to start the server and how module was being retrieved (puppet, librarian, versions etc). It worked fine for me with

bundle exec bin/puppet-forge-server -x https://forgeapi.puppetlabs.com

and

puppet module install evenup-kibana --module_repository http://localhost:8080 --debug
thlapin commented 9 years ago

I tripped over this issue today using the latest r10k. r10k no longer uses puppet module install underneath but the urls are fixed, instead of using the file_uri value.

i11 commented 9 years ago

Thanks for letting me know. It's really unfortunate that projects don't try to use the provided API instead assuming and hardcoding stuff... I'll have a look at it as fast as I can. Also I'll create a separate issue for this, not to hijack closed pull request.