openshift / origin-server

OpenShift 2 (deprecated)
889 stars 516 forks source link

deployments.rb: Changes :err to be visible to user #6317

Closed thrasher-redhat closed 8 years ago

thrasher-redhat commented 8 years ago

Using the REST api for binary deployments could result in errors with the node using the curl command, however the user would not be able to see why or how the curl command failed.

Changes the :out and :err streams sent to the node to be nil, so that the error output is sent back to the user as part of the text variable.

Bug 1126836 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1126836

thrasher-redhat commented 8 years ago

[test]

thrasher-redhat commented 8 years ago

[extended:node]

tiwillia commented 8 years ago

@thrasher-redhat can you explain what you mean by the text variable? Its odd that setting the output to nil will cause the output to be returned to the user properly. Can you explain this a bit further?

thrasher-redhat commented 8 years ago

@tiwillia From the bug report, this is the current state of things:

[guoleis@localhost php2]$ curl -k -H "Content-Type: application/json" -u gusun+1@redhat.com:*   https://$server/broker/rest/application/53db7f5e6cec0ef884001e72/deployments -d '{"artifact_url":"https://php2-gusun1.*.rhcloud.com/php2.tar.gz"}' -X POST | json_reformat
{
  "api_version": 1.7,
  "data": null,
  "messages": [
    {
      "exit_code": 2,
      "field": null,
      "index": null,
      "severity": "debug",
      "text": "Unable to extract deployment archive using command: /usr/bin/curl https://php2-gusun1.*.rhcloud.com/php2.tar.gz | /bin/tar -xz"
    },
    {
      "exit_code": 2,
      "field": null,
      "index": null,
      "severity": "error",
      "text": "Unable to complete the requested operation due to: An invalid exit code (2) was returned from the server $server.  This indicates an unexpected problem during the execution of your request.\nReference ID: 23794d15b1795fd237ef6abb4fad1dc7"
    }
  ],
  "status": "internal_server_error",
  "supported_api_versions": [
    1.0,
    1.1,
    1.2,
    1.3,
    1.4,
    1.5,
    1.6,
    1.7
  ],
  "type": null,
  "version": "1.7"
}

That response is unhelpful, but changing :err to nil instead changes the first "text" entry under messages to something more like this:

"Unable to extract deployment archive using command: /usr/bin/curl https://php2-thrasher.dev.rhcloud.com/php2.tar.gz | /bin/tar -xz\n\n\ncurl: (60) Peer certificate cannot be authenticated with known CA certificates\nMore details here: http://curl.haxx.se/docs/sslcerts.html\n\ncurl performs SSL certificate verification by default, using a \"bundle\"\n of Certificate Authority (CA) public keys (CA certs). If the default\n bundle file isn't adequate, you can specify an alternate file\n using the --cacert option.\nIf this HTTPS server uses a certificate signed by a CA represented in\n the bundle, the certificate verification probably failed due to a\n problem with the certificate (it might be expired, or the name might\n not match the domain name in the URL).\nIf you'd like to turn off curl's verification of the certificate, use\n the -k (or --insecure) option.\n\ngzip: stdin: unexpected end of file\n/bin/tar: Child returned status 1\n/bin/tar: Error is not recoverable: exiting now\n", :exitcode=>2, :addtl_params=>nil}

It isn't pretty (but then again, many users wouldn't be using something to pretty print their json, so it wouldnt be pretty either way...), but its much more informative about what went wrong. In this case, its that the certificate cannot be authenticated because its using a self-signed cert.

tiwillia commented 8 years ago

That makes much more sense, thank you for explaining @thrasher-redhat

LGTM, although we have some relevant test failures:

  1) Skipped:
test_from_ssh_url(ApplicationRepositoryFuncTest) [/root/openshift-test/node/test/gear_functional/application_repository_func_test.rb:248]:
Restore this test using webmock

  2) Failure:
test_extract_deployment_archive_valid_file(DeploymentsTest) [/root/openshift-test/node/lib/openshift-origin-node/model/application_container_ext/deployments.rb:411]:
unexpected invocation: #<OpenShift::Runtime::ApplicationContainer:0x4389948>.run_in_container_context('extract', {:in => nil, :out => nil, :err => nil, :env => {'a' => 'b'}, :chdir => '/bar'})
unsatisfied expectations:
- expected exactly once, not yet invoked: #<OpenShift::Runtime::ApplicationContainer:0x4389948>.run_in_container_context('extract', {:in => nil, :out => #<IO:0x2450220>, :err => #<IO:0x2450220>, :env => {'a' => 'b'}, :chdir => '/bar'})
satisfied expectations:
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Config>.get_bool('DISABLE_PASSWORD_AGING', true)
- allowed any number of times, invoked once: #<Mock:OpenShift::Config>.get_bool('TRAFFIC_CONTROL_ENABLED', 'true')
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Config>.get_bool('no_overcommit_active', false)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Config>.get('CONTAINERIZATION_PLUGIN')
- allowed any number of times, invoked 6 times: #<Mock:OpenShift::Config>.get(any_parameters)
- allowed any number of times, invoked twice: OpenShift::Config.new(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::Cgroups>.templates(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::Cgroups>.processes(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::Cgroups>.thaw(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::Cgroups>.freeze(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::Cgroups>.boost(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::Cgroups>.delete(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::Cgroups>.create(any_parameters)
- allowed any number of times, not yet invoked: OpenShift::Runtime::Utils::Cgroups.new(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::TC>.deluser(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::TC>.stopuser(any_parameters)
- allowed any number of times, not yet invoked: #<Mock:OpenShift::Runtime::Utils::TC>.startuser(any_parameters)
- allowed any number of times, not yet invoked: OpenShift::Runtime::Utils::TC.new(any_parameters)
- allowed any number of times, invoked once: Etc.getpwnam(any_parameters)
- expected exactly once, invoked once: File.exist?('/tmp/foo.tar.gz')
- expected exactly once, invoked once: #<OpenShift::Runtime::ApplicationContainer:0x4389948>.determine_extract_command(has_entries({:file => '/tmp/foo.tar.gz'}))
thrasher-redhat commented 8 years ago

Retesting [extended:node]

@pmorie - Want to take a look and see if this change looks like it will break anything? It doesn't look like it as far as I can see, but you'll have a much better idea of how this change might affect things.

openshift-bot commented 8 years ago

Evaluated for online test up to 0110818ab94c1e3e0ff10901db3d868b75e12ad4

openshift-bot commented 8 years ago

Online Test Results: SUCCESS (https://ci.dev.openshift.redhat.com/jenkins/job/test_pull_requests/9059/) (Extended Tests: node)

pmorie commented 8 years ago

@jwhonce Do you remember any of this stuff to give me a second set of eyes?

thrasher-redhat commented 8 years ago

@pmorie @jwhonce Any updates? Don't want to merge without it being reviewed by someone who knows the code better than I do.

thrasher-redhat commented 8 years ago

@pmorie @jwhonce So I tested for memory usage by prepending the tar command with "perl -e 'print q(x)x1000000'; ". The resulting 'text' field printed with all the x's and then the tar error message. With that much additional data - mcollective memory usage had a small jump. I doubt that the results of the tar command will reach that amount of data, so as far as I can tell, using the memory buffer shouldn't be a problem with mcollective memory usage.

jwhonce commented 8 years ago

@thrasher-redhat Given that data point, I'll give a tentative LGTM. I would warn OPS when this code goes live so they can monitor for any impact in Production.

tiwillia commented 8 years ago

Lets go ahead and [merge].

@a13m and @dak1n1 please be aware of this change. There is a bit of worry that the logging now using the memory buffer could cause some memory usage issues.

openshift-bot commented 8 years ago

Evaluated for online merge up to 0110818ab94c1e3e0ff10901db3d868b75e12ad4

tiwillia commented 8 years ago

Test failures appear to be quite unrelated.

re-[merge] please openshift-bot!

openshift-bot commented 8 years ago

Online Merge Results: SUCCESS (https://ci.dev.openshift.redhat.com/jenkins/job/merge_pull_requests/6678/) (Image: devenv_5742)