vmware-archive / photon-controller

Photon Controller
Other
26 stars 3 forks source link

unable to install Photon Agent #103

Open vChrisR opened 7 years ago

vChrisR commented 7 years ago

Photon platform seems to install fine. vib installation on the ESXi host works OK, but when the agent starts just logs errors. This is the log:

INFO [2017-04-01 12:52:13,667] [47537:47536:Thread-1] [agent.py:start:68] main: Startup config: {'datastores': None, 'host_service_threads': 20, 'logging_file_backup_count': 10, 'stats_store_port': 0, 'bootstrap_poll_frequency': 5, 'port': 8835, 'log_level': u'debug', 'config_path': '/etc/vmware/photon/controller', 'workers': 32, 'hostname': None, 'utilization_transfer_ratio': 9, 'no_syslog': False, 'thrift_timeout_sec': 3, 'memory_overcommit': 1.0, 'logging_file_size': 10485760, 'image_datastores': None, 'heartbeat_interval_sec': 10, 'auth_enabled': False, 'cpu_overcommit': 16.0, 'stats_store_endpoint': None, 'stats_enabled': False, 'control_service_threads': 1, 'heartbeat_timeout_factor': 6, 'wait_timeout': 10, 'host_id': None, 'logging_file': '/scratch/log/photon-controller-agent.log', 'console_log': False, 'deployment_id': None, 'management_only': False, 'stats_host_tags': None} INFO [2017-04-01 12:52:13,795] [47537:47536:Thread-1] [plugin.py:load_plugins:112] common.plugin: Plugins found: [<agent.plugin.AgentControlPlugin object at 0x7e2f9e8c>, <host.plugin.HostPlugin object at 0x7e44a8ec>, <stats.plugin.StatsPlugin object at 0x7e46f4cc>] INFO [2017-04-01 12:52:13,796] [47537:47536:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin AgentControl initialized INFO [2017-04-01 12:52:13,984] [47537:47536:Thread-1] [attache_client.py:init:105] host.hypervisor.esx.attache_client: AttacheClient init INFO [2017-04-01 12:52:13,989] [47537:47536:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.connect_local INFO [2017-04-01 12:52:14,005] [47537:47536:Thread-1] [attache_client.py:_start_syncing_cache:135] host.hypervisor.esx.attache_client: Start attache sync vm cache thread INFO [2017-04-01 12:52:14,016] [47537:47536:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.connect_local INFO [2017-04-01 12:52:14,016] [47537:47536:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.get_all_datastores INFO [2017-04-01 12:52:14,026] [47537:47536:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.get_all_datastores CRITICAL [2017-04-01 12:52:14,026] [47537:47536:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager: Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58df7c60-b42f5c4b-3e21-0050568a0915', name='datastore1'), Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58df8e54-31abb344-e93c-0050568a0915', name='local02')]) INFO [2017-04-01 12:52:14,026] [47537:47536:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.get_all_datastores INFO [2017-04-01 12:52:14,038] [47537:47536:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.get_all_datastores CRITICAL [2017-04-01 12:52:14,038] [47537:47536:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager: Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58df7c60-b42f5c4b-3e21-0050568a0915', name='datastore1'), Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58df8e54-31abb344-e93c-0050568a0915', name='local02')]) INFO [2017-04-01 12:52:14,039] [47537:47536:Thread-1] [image_monitor.py:init:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58df7c60-b42f5c4b-3e21-0050568a0915 INFO [2017-04-01 12:52:14,040] [47537:47536:Thread-1] [image_monitor.py:init:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58df8e54-31abb344-e93c-0050568a0915 INFO [2017-04-01 12:52:14,041] [47537:47536:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin Host initialized INFO [2017-04-01 12:52:14,041] [47537:47536:Thread-1] [stats.py:init:38] stats.stats: Stats not configured, Stats plugin will be in silent mode INFO [2017-04-01 12:52:14,042] [47537:47536:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin Stats initialized INFO [2017-04-01 12:52:14,042] [47537:47536:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin AgentControl started INFO [2017-04-01 12:52:14,042] [47537:47536:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin Host started INFO [2017-04-01 12:52:14,042] [47537:47536:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin Stats started INFO [2017-04-01 12:52:14,043] [47537:47536:Thread-1] [agent.py:_initialize_thrift_service:136] main: Load thrift services StatsService (num_threads: 2) INFO [2017-04-01 12:52:14,043] [47537:47536:Thread-1] [agent.py:_initialize_thrift_service:136] main: Load thrift services Host (num_threads: 20) INFO [2017-04-01 12:52:14,043] [47537:47536:Thread-1] [agent.py:_initialize_thrift_service:136] main: Load thrift services AgentControl (num_threads: 1) INFO [2017-04-01 12:52:14,044] [47537:47536:Thread-1] [agent.py:_initialize_thrift_service:142] main: Initialize SSLSocket using certfile=/etc/vmware/ssl/rui.crt, keyfile=/etc/vmware/ssl/rui.key, capath=/etc/vmware/ssl/ INFO [2017-04-01 12:52:14,044] [47537:47536:Thread-1] [agent.py:start:80] main: Starting the bootstrap config poll thread INFO [2017-04-01 12:52:14,045] [47537:47536:Thread-1] [agent.py:_start_thrift_service:155] main: Listening on port 8835... ERROR [2017-04-01 12:52:45,231] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: error while accepting ERROR [2017-04-01 12:52:45,232] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: Traceback (most recent call last): ERROR [2017-04-01 12:52:45,232] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmware/photon/controller/2.7/site-packages/thrift/server/TNonblockingServer.py", line 314, in handle ERROR [2017-04-01 12:52:45,232] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: AttributeError: 'NoneType' object has no attribute 'handle' ERROR [2017-04-01 12:52:46,254] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: error while accepting ERROR [2017-04-01 12:52:46,254] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: Traceback (most recent call last): ERROR [2017-04-01 12:52:46,254] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmware/photon/controller/2.7/site-packages/thrift/server/TNonblockingServer.py", line 314, in handle ERROR [2017-04-01 12:52:46,255] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: AttributeError: 'NoneType' object has no attribute 'handle' ERROR [2017-04-01 12:52:47,283] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: error while accepting ERROR [2017-04-01 12:52:47,284] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: Traceback (most recent call last): ERROR [2017-04-01 12:52:47,284] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmware/photon/controller/2.7/site-packages/thrift/server/TNonblockingServer.py", line 314, in handle ERROR [2017-04-01 12:52:47,284] [47537:47536:Thread-1] [init.py:exception:1193] thrift.server.TNonblockingServer: AttributeError: 'NoneType' object has no attribute 'handle'

The last three log entries are repeated a lot of time.

What stands out in the log are the two CRITICALs. Somehow the agent is trying to use 'datastore1' as the image store. But 'local02' was configured in the photon yaml.

Below is my photon yaml:

compute:
  hypervisors:
    vesxi60:
      hostname: "vesxi60"
      ipaddress: "192.168.192.6"
      dns: "192.168.192.78"
      credential:
        username: "root"
        password: "password"
    vesxi60c01:
      hostname: "vesxi60c01"
      ipaddress: "192.168.192.23"
      dns: "192.168.192.78"
      credential:
        username: "root"
        password: "password"

lightwave:
  domain: "photon.lab"
  credential:
    username: "administrator"
    password: "Passw0rd123!"
  controllers:
    lightwave:
      site: "homelab"
      appliance:
        hostref: "vesxi60"
        datastore: "local02"
        memoryMb: 2048
        cpus: 2
        credential:
          username: "root"
          password: "password"
        network-config:
          network: "NAT=VM Network"
          type: "static"
          hostname: "lightwave.photon.lab"
          ipaddress: "192.168.192.78"
          dns: "192.168.192.21"
          ntp: "nl.pool.ntp.org"
          netmask: "255.255.255.0"
          gateway: "192.168.192.1"
photon:
  imagestore:
    img-store-1:
      datastore: "local02"
      enableimagestoreforvms: "true"
  cloud:
    hostref1: "vesxi60c01"
  controllers:
      pc:
        appliance:
          hostref: "vesxi60"
          datastore: "local02"
          memoryMb: 2048
          cpus: 2
          credential:
            username: "root"
            password: "password"
          network-config:
            network: "NAT=VM Network"
            type: "static"
            hostname: "pc.photon.lab"
            ipaddress: "192.168.192.77"
            netmask: "255.255.255.0"
            dns: "192.168.192.78"
            ntp: "95.211.160.148"
            gateway: "192.168.192.1"
loadBalancer:
  plb:
    appliance:
      hostref: "vesxi60"
      datastore: "local02"
      credential:
        username: "root"
        password: "password"
      network-config:
        network: "NAT=VM Network"
        type: "static"
        hostname: "plb.photon.lab"
        ipaddress: "192.168.192.76"
        netmask: "255.255.255.0"
        dns: "192.168.192.78"
        ntp: "nl.pool.ntp.org"
        gateway: "192.168.192.1"

and here is the photon-installer.log:

2017-04-01 12:53:27 DEBUG headers:124 - http-outgoing-17 << HTTP/1.1 200 OK 2017-04-01 12:53:27 DEBUG headers:127 - http-outgoing-17 << Server: Apache-Coyote/1.1 2017-04-01 12:53:27 DEBUG headers:127 - http-outgoing-17 << Cache-Control: no-store 2017-04-01 12:53:27 DEBUG headers:127 - http-outgoing-17 << Pragma: no-cache 2017-04-01 12:53:27 DEBUG headers:127 - http-outgoing-17 << Content-Type: application/json;charset=UTF-8 2017-04-01 12:53:27 DEBUG headers:127 - http-outgoing-17 << Content-Length: 3800 2017-04-01 12:53:27 DEBUG headers:127 - http-outgoing-17 << Date: Sat, 01 Apr 2017 12:53:27 GMT 2017-04-01 12:53:27 DEBUG MainClientExec:284 - Connection can be kept alive indefinitely 2017-04-01 12:53:27 DEBUG PoolingHttpClientConnectionManager:314 - Connection [id: 17][route: {s}->https://192.168.192.78:443] can be kept alive indefinitely 2017-04-01 12:53:27 DEBUG PoolingHttpClientConnectionManager:320 - Connection released: [id: 17][route: {s}->https://192.168.192.78:443][total kept alive: 1; route allocated: 1 of 2; total allocated: 1 of 20] 2017-04-01 12:53:27 DEBUG PoolingHttpClientConnectionManager:388 - Connection manager is shutting down 2017-04-01 12:53:27 DEBUG DefaultManagedHttpClientConnection:81 - http-outgoing-17: Close connection 2017-04-01 12:53:27 DEBUG PoolingHttpClientConnectionManager:394 - Connection manager shut down 2017-04-01 12:53:27 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 2017-04-01 12:53:28 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 2017-04-01 12:53:30 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 2017-04-01 12:53:31 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 2017-04-01 12:53:32 DEBUG PoolingHttpClientConnectionManager:388 - Connection manager is shutting down 2017-04-01 12:53:32 DEBUG PoolingHttpClientConnectionManager:394 - Connection manager shut down 2017-04-01 12:53:32 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 2017-04-01 12:53:33 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 2017-04-01 12:53:34 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 2017-04-01 12:53:35 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 The last lines are repeated a lot of times and then: 2017-04-01 12:54:32 DEBUG ControllerInstaller:736 - https://192.168.192.77:9000/tasks/434a3171-178c-4e3c-a72a-b40b72394557 2017-04-01 12:54:33 ERROR ControllerInstaller:540 - Failed in provisioning host: vesxi60c01

AlainRoy commented 7 years ago

I'm sorry you're having problems!

What version of Photon Platform, exactly? What version of ESXi are you using, exactly?

mwest44 commented 7 years ago

What I'm seeing is that it is unable to find any image datastore in the list that it knows about. It has found datastore1 on the host, but it is not tagged as image datastore. you have two hypervisors and the image datastore must be connected to both of them or you need one for each of them.

vChrisR commented 7 years ago

both hosts are nested ESXi servers running version 6.0.0. build 3380124. I'm using Photon 1.1.1

The first host is used to deploy the appliances to, the second host is supposed to be a cloud host. They both have a "local02" datastore connected. As you can see in the yml I configured "local02" as the image store but somehow doesn't use it.

mwest44 commented 7 years ago

you will need VMware ESXi 6.0.0 Patch 201611001 (ESXi600-201611001), Build number is 4600944 i'm actually surprised the agent installed on this older build. To be clear, is local02 a shared datastore, connected to both hypervisors or two local datastores each connected to a hypervisor? If local, you will need to give them different names and reference them separately in the image datastore list of the yaml. We will change this in an upcoming release.

vChrisR commented 7 years ago

ah my bad. I had actually installed that patch. Or so I thought. I had to follow KB article 2144595 to make it actually work. Works fine now, thanks!

tactical-drone commented 7 years ago

@vChrisR , @mwest44

Hi, I am getting exactly the same error but I am running 4600944. My ESXI box has two storage controllers called "sas" and "datastore2". This is my config:

compute:
  hypervisors:
    ulysses-1:
      hostname: "ulysses-1.ctlab.local"
      ipaddress: "10.0.0.64"
      dns: "10.0.0.40"
      credential:
        username: "root"
        password: "fancypassword"
photon:
  imagestore:
    img-store-1:
      datastore: "datastore2"
      enableimagestoreforvms: "true"
  cloud:
    hostref1: "ulysses-1"
  administrator-group: "ctlab.local\\Administrators"
  controllers:
    pc-1:
      appliance:
        hostref: "ulysses-1"
        datastore: "sas"
        memoryMb: 4096
        cpus: 4
        credential:
          username: "root"
          password: "fancypassword"
        network-config:

Does it have to do with the fact that my image store is "datastore2" but my nodes are installed on "sas"?

Please help, this part of the config is not clear ot me at all:

This is my phonton-controller-agent log:

INFO     [2017-04-03 11:18:48,395] [254556:254555:Thread-1] [agent.py:start:68] __main__: Startup config: {'datastores': None, 'host_service_threads': 20, 'logging_file_backup_count': 10, 'stats_store_port': 0, 'bootstrap_poll_frequency': 
INFO     [2017-04-03 11:18:48,523] [254556:254555:Thread-1] [plugin.py:load_plugins:112] common.plugin: Plugins found: [<agent.plugin.AgentControlPlugin object at 0x4269cd6c>, <host.plugin.HostPlugin object at 0x429107cc>, <stats.plugin.St
INFO     [2017-04-03 11:18:48,523] [254556:254555:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin AgentControl initialized                                                                                                        
INFO     [2017-04-03 11:18:48,723] [254556:254555:Thread-1] [attache_client.py:__init__:105] host.hypervisor.esx.attache_client: AttacheClient init                                                                                            
INFO     [2017-04-03 11:18:48,727] [254556:254555:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.connect_local                                                                                
INFO     [2017-04-03 11:18:48,738] [254556:254555:Thread-1] [attache_client.py:_start_syncing_cache:135] host.hypervisor.esx.attache_client: Start attache sync vm cache thread                                                                
INFO     [2017-04-03 11:18:48,771] [254556:254555:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.connect_local                                                                                
INFO     [2017-04-03 11:18:48,771] [254556:254555:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.get_all_datastores                                                                           
INFO     [2017-04-03 11:18:48,780] [254556:254555:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.get_all_datastores                                                                           
CRITICAL [2017-04-03 11:05:56,592] [251852:251851:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager:                                                                                                
Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58dd1e46-a27215a0-3244-0024e83e0daf', name='sas'),                                                                                               
Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58dd1ac9-d62bb235-3760-0024e83e0daf', name='datastore2')])                                                    
INFO     [2017-04-03 11:05:56,592] [251852:251851:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.get_all_datastores                                                                           
INFO     [2017-04-03 11:05:56,600] [251852:251851:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.get_all_datastores                                                                           
CRITICAL [2017-04-03 11:05:56,600] [251852:251851:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager:                                                                                                
Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58dd1e46-a27215a0-3244-0024e83e0daf', name='sas'),                                                                                               
Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58dd1ac9-d62bb235-3760-0024e83e0daf', name='datastore2')])                                         
INFO     [2017-04-03 11:18:48,788] [254556:254555:Thread-1] [image_monitor.py:__init__:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58dd1e46-a27215a0-3244-0024e83e0daf                                                      
INFO     [2017-04-03 11:18:48,789] [254556:254555:Thread-1] [image_monitor.py:__init__:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58dd1ac9-d62bb235-3760-0024e83e0daf                                                      
INFO     [2017-04-03 11:18:48,789] [254556:254555:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin Host initialized                                                                                                                
INFO     [2017-04-03 11:18:48,789] [254556:254555:Thread-1] [stats.py:__init__:38] stats.stats: Stats not configured, Stats plugin will be in silent mode                                                                                      
INFO     [2017-04-03 11:18:48,790] [254556:254555:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin Stats initialized                                                                                                               
INFO     [2017-04-03 11:18:48,790] [254556:254555:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin AgentControl started                                                                                                            
INFO     [2017-04-03 11:18:48,790] [254556:254555:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin Host started                                                                                                                    
INFO     [2017-04-03 11:18:48,790] [254556:254555:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin Stats started                                                                                                                   
INFO     [2017-04-03 11:18:48,790] [254556:254555:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services Host (num_threads: 20)                                                                                    
INFO     [2017-04-03 11:18:48,790] [254556:254555:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services AgentControl (num_threads: 1)                                                                             
INFO     [2017-04-03 11:18:48,791] [254556:254555:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services StatsService (num_threads: 2)                                                                             
INFO     [2017-04-03 11:18:48,791] [254556:254555:Thread-1] [agent.py:_initialize_thrift_service:142] __main__: Initialize SSLSocket using certfile=/etc/vmware/ssl/rui.crt, keyfile=/etc/vmware/ssl/rui.key, capath=/etc/vmware/ssl/          
INFO     [2017-04-03 11:18:48,791] [254556:254555:Thread-1] [agent.py:start:80] __main__: Starting the bootstrap config poll thread                                                                                                            
INFO     [2017-04-03 11:18:48,792] [254556:254555:Thread-1] [agent.py:_start_thrift_service:155] __main__: Listening on port 8835...                                                                                                           
ERROR    [2017-04-03 11:18:50,473] [254556:254555:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: error while accepting                                                                                         
ERROR    [2017-04-03 11:18:50,473] [254556:254555:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: Traceback (most recent call last):                                                                            
ERROR    [2017-04-03 11:18:50,474] [254556:254555:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer:   File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmwa
ERROR    [2017-04-03 11:18:50,474] [254556:254555:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: AttributeError: 'NoneType' object has no attribute 'handle'                                                   
ERROR    [2017-04-03 11:20:06,703] [254556:254555:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: error while accepting                                                                                         
ERROR    [2017-04-03 11:20:06,704] [254556:254555:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: Traceback (most recent call last):                                                                            
ERROR    [2017-04-03 11:20:06,704] [254556:254555:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer:   File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmwa
ERROR    [2017-04-03 11:20:06,704] [254556:254555:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: AttributeError: 'NoneType' object has no attribute 'handle'                                                   
AlainRoy commented 7 years ago

@pompomJuice--is that your complete configuration? I don't see a Lightwave section. The "network-config" for Photon also appears empty. Can you share your complete YAML configuration?

tactical-drone commented 7 years ago

Hi @AlainRoy. Thanks for helping!

I have updated the agent log that I posted before, I noticed that it chopped some important info on the CRITICAL lines.

This is my complete config:

compute:
  hypervisors:
    ulysses-1:
      hostname: "ulysses-1.ctlab.local"
      ipaddress: "10.0.0.64"
      dns: "10.0.0.40"
      credential:
        username: "root"
        password: "photontorpedo"
lightwave:
  domain: "ctlab.local"
  credential:
    username: "administrator"
    password: "lightspeed"
  controllers:
    lw-1:
      site: "technopark"
      appliance:
        hostref: "ulysses-1"
        datastore: "sas"
        memoryMb: 2048
        cpus: 2
        credential:
          username: "root"
          password: "photontorpedo"
        network-config:
          type: "static"
          hostname: "lw-1.ctlab.local"
          ipaddress: "10.0.0.40"
          network: "NAT=VM Network"
          dns: "10.0.0.10"
          ntp: "197.82.150.123"
          netmask: "255.255.255.0"
          gateway: "10.0.0.1"
photon:
  imagestore:
    img-store-1:
      datastore: "datastore2"
      enableimagestoreforvms: "true"
  cloud:
    hostref1: "ulysses-1"
  administrator-group: "ctlab.local\\Administrators"
  controllers:
    pc-1:
      appliance:
        hostref: "ulysses-1"
        datastore: "sas"
        memoryMb: 4096
        cpus: 4
        credential:
          username: "root"
          password: "photontorpedo"
        network-config:
          type: "static"
          hostname: "pc-1.ctlab.local"
          ipaddress: "10.0.0.45"
          network: "NAT=VM Network"
          netmask: "255.255.255.0"
          dns: "10.0.0.40"
          ntp: "197.82.150.123"
          gateway: "10.0.0.1"
loadBalancer:
  lb-1:
    appliance:
      hostref: "ulysses-1"
      datastore: "sas"
      memoryMb: 2048
      cpus: 4
      credential:
        username: "root"
        password: "photontorpedo"
      network-config:
        type: "static"
        hostname: "lb-1.ctlab.local"
        ipaddress: "10.0.0.42"
        network: "NAT=VM Network"
        netmask: "255.255.255.0"
        dns: "10.0.0.40"
        ntp: "197.82.150.123"
        gateway: "10.0.0.1"
AlainRoy commented 7 years ago

Ah, I see the problem. My apologies--this is something we should have caught up front in the installer because the error message surfaced to you is... obscure, at best. In fact, I personally found this annoying enough that I fixed the Photon Controller installer in the upcoming 1.2 release to catch this.

You need to change the password for Lightwave so that it's at least 8 characters and have at least one uppercase, one lowercase, one digit and one punctuation character. "lightspeed" doesn't meet those requirements.

The result of doing this is that Lightwave doesn't correctly start up.

tactical-drone commented 7 years ago

Hi.

That one cost me a couple of hours to debug. My actual password passes the token agent java password check thingy. I just put those ones in for effect. My lightwave works. To get to that step lightwave has to work. It happens after the photon-controller-agent is deployed on ESXi right at the end of the installation.

Any other ideas?

tactical-drone commented 7 years ago

@AlainRoy,

I found that the only other error message the entire installation process generates comes from the photon-controller node's service that uses keytool with empty password variables. I think keytool became intolerant of those recently.

Apr 03 19:15:21 photon-127d0f3a4fdbmkLv45RdBP run.sh[1149]: unable to write 'random state'
Apr 03 19:15:21 photon-127d0f3a4fdbmkLv45RdBP run.sh[1149]: Illegal option:  __MACHINE_CERT
Apr 03 19:15:21 photon-127d0f3a4fdbmkLv45RdBP run.sh[1149]: keytool -importkeystore [OPTION]...
Apr 03 19:15:21 photon-127d0f3a4fdbmkLv45RdBP run.sh[1149]: Imports one or all entries from another keystore
Apr 03 19:15:21 photon-127d0f3a4fdbmkLv45RdBP run.sh[1149]: Options:

I fixed those but then I saw the latest code has it fixed also.

I am more interested in this error:

On the ESXi photon-controller-agent

INFO     [2017-04-03 21:04:30,549] [364759:364758:Thread-1] [agent.py:start:68] __main__: Startup config: {'datastores': None, 'host_service_threads': 20, 'logging_file_backup_count': 10, 'stats_store_port': 0, 'bootstrap_poll_frequency': 
INFO     [2017-04-03 21:04:30,674] [364759:364758:Thread-1] [plugin.py:load_plugins:112] common.plugin: Plugins found: [<agent.plugin.AgentControlPlugin object at 0xfff91d6c>, <host.plugin.HostPlugin object at 0x517d07cc>, <stats.plugin.St
INFO     [2017-04-03 21:04:30,675] [364759:364758:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin AgentControl initialized                                                                                                        
INFO     [2017-04-03 21:04:30,883] [364759:364758:Thread-1] [attache_client.py:__init__:105] host.hypervisor.esx.attache_client: AttacheClient init                                                                                            
INFO     [2017-04-03 21:04:30,887] [364759:364758:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.connect_local                                                                                
INFO     [2017-04-03 21:04:30,898] [364759:364758:Thread-1] [attache_client.py:_start_syncing_cache:135] host.hypervisor.esx.attache_client: Start attache sync vm cache thread                                                                
INFO     [2017-04-03 21:04:30,940] [364759:364758:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.connect_local                                                                                
INFO     [2017-04-03 21:04:30,940] [364759:364758:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.get_all_datastores                                                                           
INFO     [2017-04-03 21:04:30,949] [364759:364758:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.get_all_datastores                                                                           
CRITICAL [2017-04-03 21:04:30,949] [364759:364758:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager: Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='5
INFO     [2017-04-03 21:04:30,949] [364759:364758:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.get_all_datastores                                                                           
INFO     [2017-04-03 21:04:30,956] [364759:364758:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.get_all_datastores                                                                           
CRITICAL [2017-04-03 21:04:30,957] [364759:364758:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager: Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='5
INFO     [2017-04-03 21:04:30,957] [364759:364758:Thread-1] [image_monitor.py:__init__:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58dcb3cc-ca7fab80-086c-b8ac6f7efbf6                                                      
INFO     [2017-04-03 21:04:30,958] [364759:364758:Thread-1] [image_monitor.py:__init__:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58dc020a-8263054c-4590-b8ac6f7efbf6                                                      
INFO     [2017-04-03 21:04:30,958] [364759:364758:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin Host initialized                                                                                                                
INFO     [2017-04-03 21:04:30,959] [364759:364758:Thread-1] [stats.py:__init__:38] stats.stats: Stats not configured, Stats plugin will be in silent mode                                                                                      
INFO     [2017-04-03 21:04:30,959] [364759:364758:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin Stats initialized                                                                                                               
INFO     [2017-04-03 21:04:30,959] [364759:364758:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin AgentControl started                                                                                                            
INFO     [2017-04-03 21:04:30,959] [364759:364758:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin Host started                                                                                                                    
INFO     [2017-04-03 21:04:30,959] [364759:364758:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin Stats started                                                                                                                   
INFO     [2017-04-03 21:04:30,959] [364759:364758:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services Host (num_threads: 20)                                                                                    
INFO     [2017-04-03 21:04:30,960] [364759:364758:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services AgentControl (num_threads: 1)                                                                             
INFO     [2017-04-03 21:04:30,960] [364759:364758:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services StatsService (num_threads: 2)                                                                             
INFO     [2017-04-03 21:04:30,960] [364759:364758:Thread-1] [agent.py:_initialize_thrift_service:142] __main__: Initialize SSLSocket using certfile=/etc/vmware/ssl/rui.crt, keyfile=/etc/vmware/ssl/rui.key, capath=/etc/vmware/ssl/          
INFO     [2017-04-03 21:04:30,960] [364759:364758:Thread-1] [agent.py:start:80] __main__: Starting the bootstrap config poll thread                                                                                                            
INFO     [2017-04-03 21:04:30,961] [364759:364758:Thread-1] [agent.py:_start_thrift_service:155] __main__: Listening on port 8835...                                                                                                           
ERROR    [2017-04-03 21:04:37,866] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: error while accepting                                                                                         
ERROR    [2017-04-03 21:04:37,866] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: Traceback (most recent call last):                                                                            
ERROR    [2017-04-03 21:04:37,866] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer:   File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmwa
ERROR    [2017-04-03 21:04:37,866] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: AttributeError: 'NoneType' object has no attribute 'handle'                                                   
ERROR    [2017-04-03 21:05:43,661] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: error while accepting                                                                                         
ERROR    [2017-04-03 21:05:43,663] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: Traceback (most recent call last):                                                                            
ERROR    [2017-04-03 21:05:43,664] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer:   File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmwa
ERROR    [2017-04-03 21:05:43,664] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: AttributeError: 'NoneType' object has no attribute 'handle'                                                   

On the photon-controller:

Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at org.apache.thrift.transport.TNonBlockingSSLSocket.doWrap(TNonBlockingSSLSocket.java:420)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at org.apache.thrift.transport.TNonBlockingSSLSocket.doHandShake(TNonBlockingSSLSocket.java:329)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at org.apache.thrift.transport.TNonBlockingSSLSocket.startConnection(TNonBlockingSSLSocket.java:298)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at org.apache.thrift.async.TAsyncSSLMethodCall.start(TAsyncSSLMethodCall.java:144)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at org.apache.thrift.async.TAsyncSSLClientManager$SelectThread.startPendingMethods(TAsyncSSLClientManager.java:177)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at org.apache.thrift.async.TAsyncSSLClientManager$SelectThread.run(TAsyncSSLClientManager.java:116)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.Handshaker$1.run(Handshaker.java:919)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.Handshaker$1.run(Handshaker.java:916)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at java.security.AccessController.doPrivileged(Native Method)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1369)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at org.apache.thrift.transport.TNonBlockingSSLSocket.doTask(TNonBlockingSSLSocket.java:366)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at org.apache.thrift.transport.TNonBlockingSSLSocket.doHandShake(TNonBlockingSSLSocket.java:335)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: ... 4 more
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification pat
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.validator.Validator.validate(Validator.java:260)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:281)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: ... 12 more
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: ... 18 more

Notice the correlation:

ERROR [2017-04-03 21:04:37,866] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: error while accepting Apr 03 21:04:37 photon-127d0f3a4fdbmkLv45RdBP run.sh[1187]: javax.net.ssl.SSLHandshakeException: General SSLEngine problem

I am very interested in @vChrisR final configuration.

But all of this is immaterial if you can answer me one question. How to you make installer-ova-nv-1.1.1-cfb7512.ova

If i can build that image I can report better on issues. I have so far build photon-controller.ova and photon, but how do you get the installer from those? I cannot find any scripts anywhere. Do you know?

I am assuming it comes from cloud-images so I am trying to build those but I am stuck at the vixDisklib compile part. It is not being maintained and it was built against and older version of numpy. The error I get is:

vixDiskLib/vixDiskBase.c:1411:71: error: unknown type name ‘VIXDISKLIB_OPEN_FLAGS’

Any help would be awesome or just the latest drop with all those fixes.

AlainRoy commented 7 years ago

Hi @pompomJuice,

You've asked several questions, and I'm getting a little confused about the underlying problem. I'm not sure which error messages in the logs are linked to the actual symptoms you're experiencing. What symptoms are you seeing?

In your original log fragment I see:

CRITICAL [2017-04-03 11:05:56,592] [251852:251851:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager:                                                                                                
Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58dd1e46-a27215a0-3244-0024e83e0daf', name='sas'),                                                                                               
Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58dd1ac9-d62bb235-3760-0024e83e0daf', name='datastore2')])                                                    

Each ESXi host is supposed to be attached to one of the image datastores. In your case, you've selected datastore2 as your image datastore. Do you have exactly one datastore named datastore2, and it's mounted on all of your ESXi hosts?

tactical-drone commented 7 years ago

Thanks @AlainRoy.

Basically what I am saying is that I think the datastore CRITICAL error is just a decoy. It says CRICITCAL but then it says:

INFO     [2017-04-03 11:18:48,788] [254556:254555:Thread-1] [image_monitor.py:__init__:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58dd1e46-a27215a0-3244-0024e83e0daf                                                      
INFO     [2017-04-03 11:18:48,789] [254556:254555:Thread-1] [image_monitor.py:__init__:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58dd1ac9-d62bb235-3760-0024e83e0daf                                                      

So it looks like it adds them because they were missing, which they will the first time it starts. I donno. As you can see in my config, I only have one ESXi host and that host has a datastore with the name "datastore1". So my setup is correct as far as I understand.

But further below it bugs out on another error that is unrelated to datastore errors:

ERROR [2017-04-03 21:04:37,866] [364759:364758:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: error while accepting

This is because I think there are some shenanigans going on surrounding the thrift implementation. The documented thrift-0.9.3 requirement for example is false. It needs patch https://github.com/apache/thrift/pull/1114/commits/70bdc70fe7c8f9a68f712a29d3ff304e43184c2c to work. How do you guys manage to build this thing? I have spent 5 days now and I can only get parts of it to build. It does look though like these patches might be involved in the problem. I have managed to build photont-controller.ova with the hacked thrift but I don't know how to make the giant installer or tell the system that photon-conroller.ova needs to be the 600Mb version. The default is the 300Mb one without java or any of the other stuff it needs to be useful.

Regardless, this morning I realized I can switch strategy. Instead of trying to build sources I will rather us that prepared docker instance. I have manually bumped it's photon to 1.1 before and it worked. So I am going to use the docker instance to provision kubernetes and then just upload that docker instance into kubernetes. According to my calculations everything should work then.

vChrisR commented 7 years ago

Yeah the datastore errors were not related to the later errors. Those were related to the ESxi version I was running. Bit disappointed that I have to run a very specific version of ESXi. I'd rather run 6.5. Now that I'm thinking about it, since you can't use vCenter and update manager when running photon controller how do you upgrade your hosts when needed? Photon environments potentially consist of a large number of hosts. Would be a pain to update them manually or produce your scripts to do so.

tactical-drone commented 7 years ago

@vChrisR But your setup works? You can get to the UI and the UI shows that your ESXi host is READY?

Mine shows the ESXi host as NOT READY or NOT PROVISIONED when I remove it and re-add it.

I am definitely running the correct versions:

image

vChrisR commented 7 years ago

@pompomJuice Yes it's working. But Only on my second ESXi Node. Somehow I can't get the node where the photon appliances are running in READY state.

tactical-drone commented 7 years ago

@vChrisR Same issue, it does not work.

I think there is a better way to provision the cluster. It works more like a tectonic bootstrap.

I am piloting that, will let you know how it goes.

AlainRoy commented 7 years ago

It's not clear to me what the problem is. @mwest44: Do you have any ideas? @dthaluru, @snambakam: Does this look familiar to you?

About ESXi: There were changes in 6.5 that required us to make large changes our agent that runs on ESXi. We'll support 6.5 in Photon Platform 1.2, but we'll drop support for 6.0. Given that we don't have significant backwards-compatibility concerns (yet), we prefer to focus on features with a reduced testing matrix.

About Thrift: you shouldn't need to build it. Our build downloads our version of Thrift for you. I'm pretty sure that Thrift is not the problem here--it's more likely an issue with the SSL certificates.

One more thought: For the ESXi host in question, what DNS servers are configured for the host? ESXi supports a limited number and we require that one of them is Lightwave. If you have too many, we might have failed to add Lightwave as a DNS server and it could cause problems. This isn't an issue for most people, but it can happen.

tactical-drone commented 7 years ago

@AlainRoy I am investigating your DNS claim.

I used to look at the vSphere client to see if there were 3 DNS servers (but it onlyshows max 2 I found out now) and I never added more than 1. But when I checked now manually with the cli there were 3: the company DNS, lightwave DNS, and another random IP.

Let me clear those out and see what happens.

Uugh, ESXi 6.5 does not run on our servers... or so my IT guy claims. And they are 2010 Dell r710 servers. Or so my IT guy claims. This is not cool. I will have to investigate.

tactical-drone commented 7 years ago

@AlainRoy No sorry that had no effect. That extra DNS came in after I abandoned the OVA installer. I just reset the ESXi box and removed all DNS entries and tried again.

photon-setup logs:

Using configuration at ulysses.yaml
INFO: Parsing Lightwave Configuration
INFO: Parsing Credentials
INFO: Lightwave Credentials parsed successfully
INFO: Parsing Lightwave Controller Config
INFO: Parsing appliance
INFO: Parsing Credentials
INFO: Appliance Credentials parsed successfully
INFO: Parsing Network Config
INFO: Appliance network config parsed successfully
INFO: Appliance config parsed successfully
INFO: Lightwave Controller parsed successfully
INFO: Lightwave Controller config parsed successfully
INFO: Lightwave Section parsed successfully
INFO: Parsing Photon Controller Configuration
INFO: Parsing Photon Controller Image Store
INFO: Image Store parsed successfully
INFO: Managed hosts parsed successfully
INFO: Parsing Photon Controller Config
INFO: Parsing appliance
INFO: Parsing Credentials
INFO: Appliance Credentials parsed successfully
INFO: Parsing Network Config
INFO: Appliance network config parsed successfully
INFO: Photon Controllers parsed successfully
INFO: Photon section parsed successfully
INFO: Parsing Compute Configuration
INFO: Parsing Compute Config
INFO: Parsing Credentials
INFO: Compute Config parsed successfully
INFO: Parsing LoadBalancer Configuration
INFO: Parsing LoadBalancer Config
INFO: Parsing appliance
INFO: Parsing Credentials
INFO: Appliance Credentials parsed successfully
INFO: Parsing Network Config
INFO: Appliance network config parsed successfully
INFO: LoadBalancer Config parsed successfully
Validating configuration
Validating compute configuration
Validating identity configuration
2017-04-04 15:06:00 INFO  Installing Lightwave
2017-04-04 15:06:00 INFO  Install Lightwave Controller at lw-1
2017-04-04 15:06:03 INFO  Info: Lightwave was not existing at the specified at the given IP address.Deploying new Lightwave OVA
2017-04-04 15:06:03 INFO  Start [Task: Lightwave Installation]
2017-04-04 15:06:03 INFO  Info [Task: Lightwave Installation] : Deploying and Powering on the Lightwave VM on ESXi host: 10.0.0.78
2017-04-04 15:06:03 INFO  Info: Deploying and Powering on the Lightwave VM on ESXi host: 10.0.0.78
2017-04-04 15:06:03 INFO  Info [Task: Lightwave Installation] : Starting appliance deployment
2017-04-04 15:06:11 INFO  Progress [Task: Lightwave Installation]: 20%
2017-04-04 15:06:13 INFO  Progress [Task: Lightwave Installation]: 40%
2017-04-04 15:06:15 INFO  Progress [Task: Lightwave Installation]: 60%
2017-04-04 15:06:17 INFO  Progress [Task: Lightwave Installation]: 80%
2017-04-04 15:07:06 INFO  Progress [Task: Lightwave Installation]: 0%
2017-04-04 15:07:06 INFO  Stop [Task: Lightwave Installation]
2017-04-04 15:08:52 INFO  Info: Lightwave already exists. Skipping deployment of lightwave.
2017-04-04 15:08:53 INFO  COMPLETE: Install Lightwave Controller
2017-04-04 15:08:53 INFO  Installing Photon Controller Cluster
2017-04-04 15:08:53 INFO  Info: Installing the Photon Controller Cluster
2017-04-04 15:08:53 INFO  Info: Photon Controller peer node at IP address [10.0.0.45]
2017-04-04 15:08:53 INFO  Info: The number of Photon Controller specified in the config file are - 1
2017-04-04 15:08:53 INFO  Start [Task: Photon Controller Installation]
2017-04-04 15:08:53 INFO  Info [Task: Photon Controller Installation] : Deploying and Powering on the Photon Controller VM on ESXi host: 10.0.0.78
2017-04-04 15:08:53 INFO  Info: Deploying and Powering on the Photon Controller VM on ESXi host: 10.0.0.78
2017-04-04 15:08:53 INFO  Info [Task: Photon Controller Installation] : Starting appliance deployment
2017-04-04 15:09:02 INFO  Progress [Task: Photon Controller Installation]: 20%
2017-04-04 15:09:04 INFO  Progress [Task: Photon Controller Installation]: 40%
2017-04-04 15:09:07 INFO  Progress [Task: Photon Controller Installation]: 60%
2017-04-04 15:09:10 INFO  Progress [Task: Photon Controller Installation]: 80%
2017-04-04 15:09:58 INFO  Progress [Task: Photon Controller Installation]: 0%
2017-04-04 15:09:58 INFO  Stop [Task: Photon Controller Installation]
2017-04-04 15:09:58 INFO  Info: Getting OIDCTokens from Lightwave to make APi Calls
2017-04-04 15:10:33 INFO  Info: Using Image Store - datastore1
2017-04-04 15:10:34 INFO  Info: Setting new security group.
2017-04-04 15:10:36 INFO  COMPLETE: Install Photon Controller Cluster
2017-04-04 15:10:36 INFO  Installing Load Balancer
2017-04-04 15:10:36 INFO  Start [Task: Load Balancer Installation]
2017-04-04 15:10:36 INFO  Info [Task: Load Balancer Installation] : Deploying and Powering on the HaProxy VM on ESXi host: 10.0.0.78
2017-04-04 15:10:36 INFO  Info: Deploying and Powering on the HaProxy VM on ESXi host: 10.0.0.78
2017-04-04 15:10:36 INFO  Info [Task: Load Balancer Installation] : Starting appliance deployment
2017-04-04 15:10:44 INFO  Progress [Task: Load Balancer Installation]: 20%
2017-04-04 15:10:45 INFO  Progress [Task: Load Balancer Installation]: 40%
2017-04-04 15:10:45 INFO  Progress [Task: Load Balancer Installation]: 60%
2017-04-04 15:11:33 INFO  Progress [Task: Load Balancer Installation]: 0%
2017-04-04 15:11:33 INFO  Stop [Task: Load Balancer Installation]
2017-04-04 15:11:33 INFO  COMPLETE: Install Load Balancer
2017-04-04 15:11:33 INFO  Preparing Managed Host ulysses-1 to be managed by Photon Controller 
2017-04-04 15:11:33 INFO  Registering Managed Host ulysses-1 with Photon Controller
2017-04-04 15:11:40 INFO  COMPLETE: Registration of Managed Host
2017-04-04 15:11:40 INFO  Installing Photon Agent on Managed Host ulysses-1
2017-04-04 15:11:40 INFO  Start [Task: Hypervisor preparation]
2017-04-04 15:11:40 INFO  Info: Found Lightwave VIB at /var/opt/vmware/photon/agent/vibs/VMware-lightwave-esx-1.0.0-5075989.vib
2017-04-04 15:11:40 INFO  Info: Found Photon Agent VIB at /var/opt/vmware/photon/agent/vibs/photon-controller-agent-v1.1.1-319facd.vib
2017-04-04 15:11:40 INFO  Info: Found Envoy VIB at /var/opt/vmware/photon/agent/vibs/vmware-envoy-latest.vib
2017-04-04 15:11:40 INFO  Info [Task: Hypervisor preparation] : Establishing SCP session to host 10.0.0.78
2017-04-04 15:11:41 INFO  Info [Task: Hypervisor preparation] : Skipping Syslog configuration on host 10.0.0.78
2017-04-04 15:11:41 INFO  Info [Task: Hypervisor preparation] : Copying VIBs to host 10.0.0.78
2017-04-04 15:11:41 INFO  Info: Copying file /var/opt/vmware/photon/agent/vibs/photon-controller-agent-v1.1.1-319facd.vib to remote location /tmp/photon-controller-agent-v1.1.1-319facd.vib
2017-04-04 15:11:41 INFO  Info: Copying file /var/opt/vmware/photon/agent/vibs/vmware-envoy-latest.vib to remote location /tmp/vmware-envoy-latest.vib
2017-04-04 15:11:41 INFO  Info: Copying file /var/opt/vmware/photon/agent/vibs/VMware-lightwave-esx-1.0.0-5075989.vib to remote location /tmp/VMware-lightwave-esx-1.0.0-5075989.vib
2017-04-04 15:11:41 INFO  Info [Task: Hypervisor preparation] : Installing Photon Agent on host 10.0.0.78
2017-04-04 15:11:41 INFO  Info: Leaving the domain in case the ESX host was already added
2017-04-04 15:11:42 INFO  Info: Unconfiguring Lightwave on the ESX host
2017-04-04 15:11:48 INFO  Info: Uninstalling old Photon VIBS from remote system
2017-04-04 15:13:31 INFO  Info: Installing Photon VIBS on remote system
2017-04-04 15:15:30 INFO  Info [Task: Hypervisor preparation] : Joining host 10.0.0.78 to Lightwave domain
2017-04-04 15:15:30 INFO  Info: Attempting to join the ESX host to Lightwave
2017-04-04 15:15:47 INFO  Info [Task: Hypervisor preparation] : Removing VIBs from host 10.0.0.78
2017-04-04 15:15:47 INFO  Info: Removing Photon VIBS from remote system
2017-04-04 15:15:48 INFO  Stop [Task: Hypervisor preparation]
2017-04-04 15:15:48 INFO  COMPLETE: Install Photon Agent
2017-04-04 15:15:48 INFO  Provisioning the host to change its state to READY
2017-04-04 15:16:56 ERROR Failed in provisioning host: ulysses-1.ctlab.local
java.lang.RuntimeException: Task failed to provide correct state ERROR

ESXi photon-controller-agent logs:

INFO     [2017-04-04 15:14:01,578] [54334:54333:Thread-1] [agent.py:start:68] __main__: Startup config: {'datastores': None, 'host_service_threads': 20, 'logging_file_backup_count': 10, 'stats_store_port': 0, 'bootstrap_poll_frequency': 5, 'port': 8835, 'log_level': u'debug', 'config_path': '/etc/vmware/photon/controller', 'workers': 32, 'hostname': None, 'utilization_transfer_ratio': 9, 'no_syslog': False, 'thrift_timeout_sec': 3, 'memory_overcommit': 1.0, 'logging_file_size': 10485760, 'image_datastores': None, 'heartbeat_interval_sec': 10, 'auth_enabled': False, 'cpu_overcommit': 16.0, 'stats_store_endpoint': None, 'stats_enabled': False, 'control_service_threads': 1, 'heartbeat_timeout_factor': 6, 'wait_timeout': 10, 'host_id': None, 'logging_file': '/scratch/log/photon-controller-agent.log', 'console_log': False, 'deployment_id': None, 'management_only': False, 'stats_host_tags': None}
INFO     [2017-04-04 15:14:01,729] [54334:54333:Thread-1] [plugin.py:load_plugins:112] common.plugin: Plugins found: [<agent.plugin.AgentControlPlugin object at 0xffdcbd6c>, <host.plugin.HostPlugin object at 0x6ec107cc>, <stats.plugin.StatsPlugin object at 0x6ec3740c>] 
INFO     [2017-04-04 15:14:01,730] [54334:54333:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin AgentControl initialized
INFO     [2017-04-04 15:14:01,931] [54334:54333:Thread-1] [attache_client.py:__init__:105] host.hypervisor.esx.attache_client: AttacheClient init
INFO     [2017-04-04 15:14:01,935] [54334:54333:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.connect_local
INFO     [2017-04-04 15:14:01,946] [54334:54333:Thread-1] [attache_client.py:_start_syncing_cache:135] host.hypervisor.esx.attache_client: Start attache sync vm cache thread
INFO     [2017-04-04 15:14:01,983] [54334:54333:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.connect_local
INFO     [2017-04-04 15:14:01,984] [54334:54333:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.get_all_datastores
INFO     [2017-04-04 15:14:01,992] [54334:54333:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.get_all_datastores
CRITICAL [2017-04-04 15:14:01,992] [54334:54333:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager: Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58d
cb3cc-ca7fab80-086c-b8ac6f7efbf6', name='datastore1'), Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58dc020a-8263054c-4590-b8ac6f7efbf6', name='sas')])
INFO     [2017-04-04 15:14:01,992] [54334:54333:Thread-1] [attache_client.py:nested:79] host.hypervisor.esx.attache_client: Enter AttacheClient.get_all_datastores
INFO     [2017-04-04 15:14:01,999] [54334:54333:Thread-1] [attache_client.py:nested:98] host.hypervisor.esx.attache_client: Leave AttacheClient.get_all_datastores
CRITICAL [2017-04-04 15:14:02,000] [54334:54333:Thread-1] [datastore_manager.py:_initialize_datastores:93] host.hypervisor.datastore_manager: Image datastore(s) [] not found in set([Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58d
cb3cc-ca7fab80-086c-b8ac6f7efbf6', name='datastore1'), Datastore(tags=frozenset(['LOCAL_VMFS']), type=0, id='58dc020a-8263054c-4590-b8ac6f7efbf6', name='sas')])
INFO     [2017-04-04 15:14:02,000] [54334:54333:Thread-1] [image_monitor.py:__init__:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58dcb3cc-ca7fab80-086c-b8ac6f7efbf6
INFO     [2017-04-04 15:14:02,000] [54334:54333:Thread-1] [image_monitor.py:__init__:36] host.image.image_monitor: IMAGE SCANNER: adding datastore: 58dc020a-8263054c-4590-b8ac6f7efbf6
INFO     [2017-04-04 15:14:02,001] [54334:54333:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin Host initialized
INFO     [2017-04-04 15:14:02,001] [54334:54333:Thread-1] [stats.py:__init__:38] stats.stats: Stats not configured, Stats plugin will be in silent mode
INFO     [2017-04-04 15:14:02,001] [54334:54333:Thread-1] [plugin.py:load_plugins:118] common.plugin: Plugin Stats initialized
INFO     [2017-04-04 15:14:02,002] [54334:54333:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin AgentControl started
INFO     [2017-04-04 15:14:02,002] [54334:54333:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin Host started
INFO     [2017-04-04 15:14:02,002] [54334:54333:Thread-1] [plugin.py:load_plugins:128] common.plugin: Plugin Stats started
INFO     [2017-04-04 15:14:02,002] [54334:54333:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services Host (num_threads: 20)
INFO     [2017-04-04 15:14:02,002] [54334:54333:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services AgentControl (num_threads: 1)
INFO     [2017-04-04 15:14:02,003] [54334:54333:Thread-1] [agent.py:_initialize_thrift_service:136] __main__: Load thrift services StatsService (num_threads: 2)
INFO     [2017-04-04 15:14:02,003] [54334:54333:Thread-1] [agent.py:_initialize_thrift_service:142] __main__: Initialize SSLSocket using certfile=/etc/vmware/ssl/rui.crt, keyfile=/etc/vmware/ssl/rui.key, capath=/etc/vmware/ssl/
INFO     [2017-04-04 15:14:02,003] [54334:54333:Thread-1] [agent.py:start:80] __main__: Starting the bootstrap config poll thread
INFO     [2017-04-04 15:14:02,003] [54334:54333:Thread-1] [agent.py:_start_thrift_service:155] __main__: Listening on port 8835...
ERROR    [2017-04-04 15:14:13,727] [54334:54333:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: error while accepting
ERROR    [2017-04-04 15:14:13,728] [54334:54333:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: Traceback (most recent call last):
ERROR    [2017-04-04 15:14:13,728] [54334:54333:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer:   File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmware/photon/controller/2.7/site-packages/thrift/server/TNonblockingServer.py", line 314, in handle
ERROR    [2017-04-04 15:14:13,728] [54334:54333:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: AttributeError: 'NoneType' object has no attribute 'handle'
ERROR    [2017-04-04 15:15:51,725] [54334:54333:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: error while accepting
ERROR    [2017-04-04 15:15:51,726] [54334:54333:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer: Traceback (most recent call last):
ERROR    [2017-04-04 15:15:51,726] [54334:54333:Thread-1] [TNonblockingServer.py:handle:318] thrift.server.TNonblockingServer:   File "/var/lib/jenkins/workspace/photon-controller-python/agent/create_vib.yFQcg/vib/payloads/agent/opt/vmware/photon/controller/2.7/site-packages/thrift/server/TNonblockingServer.py", line 314, in handle
.
.
. 
forever spams this
AlainRoy commented 7 years ago

I'm sorry, I'm out of ideas. Hopefully one of the people I cc:ed will be able to see something I missed.

I wasn't aware that ESXi 6.5 dropped hardware support for anything. Is this going to be a widspread problem for people? Where can I learn more about the dropped support?

tactical-drone commented 7 years ago

@AlainRoy Out of ideas, but not out of options.

I am currently building a photon iso. (It takes days)

I can currently build a skeleton photon-controller.ova. I am not sure how to bake in java.

I know how I can deploy this photon-controller.ova I can therefor debug this error.

All I need is more technical people to help with some niggles. The documentation is not clear on how to produce the artifacts that you guys do.

I have been running my linux inside virtualbox before so vagrant was not working for me. Then I realized that I need virtualbox to build the photon-controller anyway. I then got a new setup up and running that can do virtualbox. So I will give vagrant another go and build the images that way. All I need is clear instructions (or a build script) that will combine artifacts from photon & photon-controller projects into those artifacts we see on the release page.

AlainRoy commented 7 years ago

The Photon Controller OVA is built by a script in the appliances directory. It takes me about 10-20 minutes to build it--not sure why it takes you days. What's going on?

tactical-drone commented 7 years ago

@AlainRoy I think it is because it is downloading stuff and building every rpm from source one at a time with one cpu.

AlainRoy commented 7 years ago

It should build just one or two RPMs, not the entire OS. Perhaps the downloads are slower for you than us.

tactical-drone commented 7 years ago

Most likely. @20Mbit only.

The build took 14h. 24 cpu 3ghz xeon 32GB ram

tactical-drone commented 7 years ago

@AlainRoy I would love if you would investigate this thrift issue. The vmware photon source does not build against stock thrift 0.9.3. I implore you to go look at the code again and get those guys to commit the parts we need to build this thing. Please.

I am starting to burn huge amount of hours trying to fix something that should have worked. I have managed to get it to build myself, but is a major hack job. I hacked parts of that patch into 0.9.3. But I think a professional should just fix the code. My thrift fails all kinds of unit tests after the async hack. There is no excuse for this. For other contributors to be able to participate here and help, we cannot have them also drop a week on these basic issues.

Considering the nature of this thrift async implementation (it looks like thrift has been hacked) and the reported failure log (that squarely pins the error on the thrift socket layer) , I will be very surprised if the bug is not contained within this async patch.

AlainRoy commented 7 years ago

Hi @pompomJuice: I'm heading out on vacation for a few days, but I'm looking for someone else to give a hand since I've run out of ideas. I think the Thrift code is fine--this is usually indicative of other errors. Hopefully we'll find someone to investigate those for you.

We've been planning to upstream our work to Thrift. I'm not sure why it's stalled. I'll look into that. We extended Thrift so that it supports asynchronous library calls when using SSL.

schadr commented 7 years ago

@pompomJuice regarding building the code, you can find the mac version of the thrift binary we use to generate our java code here https://s3.amazonaws.com/photon-platform/artifacts/thrift/non-blocking-ssl/mac/thrift, if you need them for a different OS let me know.

I'll follow up with more questions once I read through the issue.

tactical-drone commented 7 years ago

We've been planning to upstream our work to Thrift. I'm not sure why it's stalled. I'll look into that. We extended Thrift so that it supports asynchronous library calls when using SSL.

Yes! That is what I am saying. Asking even. Can I have those extensions please? v1.1.1 is built against it.

@schadr I need it for a 16.04. Currently my thrift hack patch (my source patch apache/thrift@70bdc70 was on a 0.10.0 branch, not 0.9.3) builds and installs now after I disabled thrift ruby. I hope the system does not use thrift ruby.

I will have an photon-controller.ova soon. All I need to know is how to I build the photon-controller.ova that is found inside the installer ova. Or how to even build the installer ova.

By tomorrow I will have answered most of these questions anyway so don't stress.

schadr commented 7 years ago

My first suspicion would be that the joining of the ESXi host to lightwave failed.

Looking at your yml file one thing that differs from the files we use for our internal testing is that we aren't adding the domain to the hostnames.

from your yml file above:

compute: hypervisors: ulysses-1: hostname: "ulysses-1.ctlab.local"

to get some more detail can you set the installer log level to debug, the log4j setting are here: /opt/vmware/photon/controller/share/log4j.properties

That should then log the output of the vib install on the host and tell us if it is a lightwave joining issue.

schadr commented 7 years ago

re:hostnames, for the VMs it needs to be the fully qualified but for the ESXi hosts it should not.

schadr commented 7 years ago

here are the thrift binaries for lin64 https://s3.amazonaws.com/photon-platform/artifacts/thrift/non-blocking-ssl/lin64/thrift

tactical-drone commented 7 years ago

@schadr It is definitely not a litewave issue. The logs show that the communication between the ESXi photon-controller-agent & photon-controller is failing. If you check the log you can see that there is no way the load balancer or controller is deployed if lightwave failed. I debugged many a night only to find that lightwave spit out my password. So I know it works now.

Using FQDN had no effect. Same error.

schadr commented 7 years ago

Yes the reason why I mention lightwave is that lightwave is issuing the certificates.

The Photon-Controller node is able to reach the ESXi node, hence the thrift ssl errors.

But in the past when we saw these ssl errors the root cause was that we weren't able to join the ESXi host to lightwave and subsequently weren't able to retrieve certs from lightwave and fell back to using the default certs available on the ESXi host. But since we rely on mutual authentication between the Photon-Controller node and the Photon-Controller agent this fails if the ESXi host was not able to get it's certs from Lightwave.

schadr commented 7 years ago

Did you try running the installer with debug level logs?

schadr commented 7 years ago

you can check ls -l /etc/vmware/ssl to see if a new cert was generated when you ran your deployment

tactical-drone commented 7 years ago

@schadr There is a bug in the current deployer where the photon-controller template does not implement keyStorePassword. It makes one key gen fail. I have hacked the photon-controller bootup process to such a degree that I could fix that problem, and maybe that was the problem. But alas, same error.

Higher log levers only show success every where. Just that last task that fails. Just look at the logs I posted above. (I ran TRACE logs) It clearly shows the problem. TLS breaks between (ESXi photon-controller-agent) and the photon-controller node. Thats the issue. It just bugs out with :

Apr 05 17:40:31 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: Illegal option:  __MACHINE_CERT
Apr 05 17:40:31 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: keytool -importkeystore [OPTION]...
Apr 05 17:40:31 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: Imports one or all entries from another keystore

and

Apr 05 17:40:56 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: 10.0.0.88 - - [05/Apr/2017:17:40:56 +0000] "POST /deployments/default/hosts HTTP/1.1" 201 604 "-" "Jersey/2.12 (HttpUrlConnection 1.8.0_92-BLFS)" 53
Apr 05 17:40:57 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: 10.0.0.88 - - [05/Apr/2017:17:40:57 +0000] "GET /tasks/738d735c-d524-424b-88c5-e8f7e1d44f4a HTTP/1.1" 200 707 "-" "Jersey/2.12 (HttpUrlConnection 1.8.0_92-BLFS)" 18
Apr 05 17:40:58 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: 10.0.0.88 - - [05/Apr/2017:17:40:58 +0000] "GET /tasks/738d735c-d524-424b-88c5-e8f7e1d44f4a HTTP/1.1" 200 707 "-" "Jersey/2.12 (HttpUrlConnection 1.8.0_92-BLFS)" 6
Apr 05 17:41:00 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: 10.0.0.88 - - [05/Apr/2017:17:41:00 +0000] "GET /tasks/738d735c-d524-424b-88c5-e8f7e1d44f4a HTTP/1.1" 200 707 "-" "Jersey/2.12 (HttpUrlConnection 1.8.0_92-BLFS)" 7
Apr 05 17:41:01 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: 10.0.0.88 - - [05/Apr/2017:17:41:01 +0000] "GET /tasks/738d735c-d524-424b-88c5-e8f7e1d44f4a HTTP/1.1" 200 707 "-" "Jersey/2.12 (HttpUrlConnection 1.8.0_92-BLFS)" 7
Apr 05 17:41:02 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: 10.0.0.88 - - [05/Apr/2017:17:41:02 +0000] "GET /tasks/738d735c-d524-424b-88c5-e8f7e1d44f4a HTTP/1.1" 200 744 "-" "Jersey/2.12 (HttpUrlConnection 1.8.0_92-BLFS)" 7
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at org.apache.thrift.transport.TNonBlockingSSLSocket.doWrap(TNonBlockingSSLSocket.java:420)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at org.apache.thrift.transport.TNonBlockingSSLSocket.doHandShake(TNonBlockingSSLSocket.java:329)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at org.apache.thrift.transport.TNonBlockingSSLSocket.startConnection(TNonBlockingSSLSocket.java:298)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at org.apache.thrift.async.TAsyncSSLMethodCall.start(TAsyncSSLMethodCall.java:144)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at org.apache.thrift.async.TAsyncSSLClientManager$SelectThread.startPendingMethods(TAsyncSSLClientManager.java:177)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at org.apache.thrift.async.TAsyncSSLClientManager$SelectThread.run(TAsyncSSLClientManager.java:116)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:304)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.Handshaker$1.run(Handshaker.java:919)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.Handshaker$1.run(Handshaker.java:916)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at java.security.AccessController.doPrivileged(Native Method)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1369)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at org.apache.thrift.transport.TNonBlockingSSLSocket.doTask(TNonBlockingSSLSocket.java:366)
Apr 05 17:42:34 photon-127d0f3a4fdbmkLv45RdBP run.sh[1214]: at org.apache.thrift.transport.TNonBlockingSSLSocket.doHandShake(TNonBlockingSSLSocket.java:335)
tactical-drone commented 7 years ago

@schadr There are no keys in /etc/vmware/ssl anywhere, except the ESXi host.

root@photon-127d0f3a4fdbmkLv45RdBP [ /etc/vmware ]# ls
java  photon
root@photon-127d0f3a4fdbmkLv45RdBP [ /etc/vmware ]# ls photon/
keystore.p12
root@photon-127d0f3a4fdbmkLv45RdBP [ /etc/vmware ]# 

This is because of the keyStorePassword issue. I have fixed it once though it had no effect.

schadr commented 7 years ago

if there are no certs there than that explains the thirft issues, the agent is not able to perform the handshake which results in the python style NPE.

Another reason that the domain join on the ESXi host might fail is due to too much time skew between ESXi and the Lightwave VM.

Oh and just found out the install will never fail during VIB install only when trying to add the host to the control plain.

tactical-drone commented 7 years ago

@schadr Can you point me the the expectation of these certs?

schadr commented 7 years ago

sorry just realized the /etc/vmware/ssl should be on the ESXi host, on the Photon-Controller node they are in a different place

schadr commented 7 years ago

can you elaborate on what you mean by expectations on these certs?

tactical-drone commented 7 years ago

@schadr What I mean is, where in the system does say it wants those certs? Where is that config? And what part of the installer puts them down? The only keygen could find was happening on /etc/vmware/photon, and that keygen is broken because of the keytool empty passphrase keyStorePassword issue.

tactical-drone commented 7 years ago

@schadr Have you even tried to build v1.1.1? I see so many errors and issues and just plain cheating. Some of the image build scripts just pull images off the web instead of taking local built ones. What is the point of that? What is going on here? That is not what you would expect a source build does? Is vmware photon v1.1.1 a charade? How do I build installer-ova-nv-1.1.1-cfb7512.ova? I have been trying for a week and nothing. Just huge amounts of broken scripts.

I am starting to believe that there is absolutely no way anyone can build v1.1.1, including you guys. I guarantee it. There are problems & bugs everywhere. In fact I would go as far as to say vmware has probably lost complete control over this photon project (or it's developers must have left). It's a giant huge mess. There is no way an outsider can work on this thing. There are to many random broken scripts everywhere and when they do work the artifacts they produce makes no sense or look nothing like the artifacts that I get on the release page.

tactical-drone commented 7 years ago

@AlainRoy look at this:

noob@photon-dev:~/src/photon$ make photon-build-machine
Building photon-build-machine with Packer...
photon-build-machine output will be in this color.

==> photon-build-machine: Downloading or copying ISO
    photon-build-machine: Downloading or copying: http://ubuntu.bhs.mirrors.ovh.net/ftp.ubuntu.com/releases/trusty/ubuntu-14.04.3-server-amd64.iso
    photon-build-machine: Error downloading: checksums didn't match expected: 0501c446929f713eb162ae2088d8dc8b6426224a
==> photon-build-machine: ISO download failed.
Build 'photon-build-machine' errored: ISO download failed.

==> Some builds didn't complete successfully and had errors:
--> photon-build-machine: ISO download failed.

==> Builds finished but no artifacts were created.
Makefile:413: recipe for target 'photon-build-machine' failed
make: *** [photon-build-machine] Error 1

https://github.com/vmware/photon/blob/master/support/packer-templates/photon-build-machine.json#L11

Basic issues like this. Does not even work. I mean how hard can it be to detect that bug? That URL was probably discontinued 5 years ago and no-one notices because this project is not really open source and is not being tested at all.

Why is this this software open source? You guys care nothing for outside contributors. Here I was, thinking yea fuck this shit coreos tectonic stuff I will switch to vmware instead. I mean this is vmware right? Respectable high profile company with unlimited resources and great products. First out of the box command, bam FAILED, total waste of my time. This is not what I expected from vmware. This is so poor.

AlainRoy commented 7 years ago

I have personally built 1.1.1 many times, and I have seen it built in our automated build and test environment many times. Let's please keep our discussions civil.

To address your questions:

  1. The certificates are created as part of joining the domain. Lightwave is analogous to Active Directory. When a machine joins the domain, quite a few things happen, including the creation of the certificates. In most cases we don't explicitly "configure a request of the certificates"--it's a side effect of joining the domain.

  2. The link you share to downloading the image: You're right, we do not build the entire OS from scratch, but we use a pre-existing ISO image from the Photon OS project. If it's considered cheating to not build the OS from scratch, I confess: we cheat. If you want to build it from scratch, I haven't done it personally, but I believe it's 100% in GitHub and should be buildable.

  3. The installer image currently cannot be built from our open-source project. For various internal reasons, it was not released as open source. I'm hoping that this will change in the near future.

Overall, Photon Controller is mostly open source. There are a few pieces (the installer, the envoy VIB, the UI) that are closed source for various reasons. That's why you see them being downloaded. This may change in the future, but I do not have a timeline to share with you right now.

At this point, I'm not sure what you'd like help with. I think there are two issues:

a) Your installation fails. It looks like joining the domain has failed, so you do not have certificates, and this manifests as a Thrift error. It sounds like we need to improve our error reporting to aid in debugging-no disagreement there. I'll see what I can do about that. We're happy to help debug the underlying issue here if you're interested in continuing.

b) You are failing to build the product from scratch. It's unclear to me if the build is outright failing, or if you are simply unable to access 100% of the source code. If it's the former, please let us know the errors and we'll address them. If it's the latter, we're aware of it.

tactical-drone commented 7 years ago

This is the diff between configure-guest.sh that comes from your secret installer vs configure-guest.sh if you build photon-controller.ova manually from the same version's source code. Not to mention that the size differs by like 300 Mb.

So I cannot even substitute the secret installer's /var/opt/vmware/photon/controller/appliances/photon-controller.ova-dir with the one build from source because they are miles apart. Only the secret installer has the configuration that actually works. The one build from sources backfires immediately when it tries to extract the ovf environment. The only thing that happens before this step is booting the kernel.

tactical-drone commented 7 years ago

The issue seemed to have been upgrading my ESXi to 4600944. The update for the photon fix bits must not taken properly. Since it's Dell's 4600944 image for their server, I wont be surprised. Reinstalling ESXi instead of upgrading fixed the agent issues.

It works now, but there are still some errors in the installer I would look at:

lightwave:

root@lw-1 [ ~ ]# journalctl | grep vma
Apr 08 10:01:24 lw-1 lwsmd[916]: Starting service: vmafd
Apr 08 10:01:26 lw-1 vmdird[1072]: t@139755394942848: dlopen /opt/vmware/sbin/vmafd/lib64/libvmafdclient.so library failed, error msg ((null))
Apr 08 10:01:29 lw-1 lwsm[926]: Starting service: vmafd
Apr 08 10:01:32 lw-1 vmafdd[1004]: t@139725117568768: ERROR! [VecsIpcCreateCertStore] is returning  [183]
Apr 08 10:01:32 lw-1 vmafdd[1004]: t@139725117568768: VecsSrvFlushRootCertificate Failed to flush trusted root to download directory, 2
Apr 08 10:01:32 lw-1 vmafdd[1004]: t@139725117568768: ERROR! [VecsIpcCreateCertStore] is returning  [183]
Apr 08 10:01:32 lw-1 vmafdd[1004]: t@139725117568768: ERROR! [VecsIpcGetEntryByAlias] is returning  [4312]
Apr 08 10:01:32 lw-1 vmafdd[1004]: t@139725117568768: VecsSrvFlushMachineSslCertificate returning 2
Apr 08 10:01:32 lw-1 vmdird[1245]: t@140147282241408: dlopen /opt/vmware/sbin/vmafd/lib64/libvmafdclient.so library failed, error msg ((null))
Apr 08 10:02:31 lw-1 vmafdd[1004]: t@139725125961472: VecsSrvFlushRootCertificate Failed to flush trusted root to download directory, 2
Apr 08 10:02:31 lw-1 vmafdd[1004]: t@139725125961472: VecsSrvFlushCrl Failed to flush CRL to download directory, 2
root@lw-1 [ ~ ]# ^C
root@lw-1 [ ~ ]# /opt/vmware/sbin/vmafdd -s -l5; journalctl -f
-- Logs begin at Wed 2017-01-25 22:17:48 UTC. --
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/authservice.c,36]
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: VMware afd Service registered successfully.
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/rpc.c,576]
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/service.c,207]
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/service.c,144]
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/service.c,46]
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/init.c,118]
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/main.c,82]
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/rpc.c,468]
Apr 08 10:38:48 lw-1 vmafdd[8701]: t@140127790397312: [../../../server/vmafd/service.c,99]

photon-controller:


Apr 08 10:03:29 photon-127d0f3a4fdbmkLv45RdBP configure-guest.sh[396]: 20170408100329:INFO:Starting service [vmafd]
Apr 08 10:03:29 photon-127d0f3a4fdbmkLv45RdBP vmafdd[772]: t@139629044606720: VecsSrvFlushRootCertificate Failed to flush trusted root to download directory, 2
Apr 08 10:03:29 photon-127d0f3a4fdbmkLv45RdBP vmafdd[772]: t@139629044606720: VecsSrvFlushCrl Failed to flush CRL to download directory, 2
Apr 08 10:03:29 photon-127d0f3a4fdbmkLv45RdBP vmafdd[772]: t@139629328684800: ERROR! [VecsIpcCreateCertStore] is returning  [183]
Apr 08 10:03:29 photon-127d0f3a4fdbmkLv45RdBP vmafdd[772]: t@139629328684800: VecsSrvFlushRootCertificate Failed to flush trusted root to download directory, 2
Apr 08 10:03:30 photon-127d0f3a4fdbmkLv45RdBP vmafdd[772]: t@139629328684800: ERROR! [VecsIpcCreateCertStore] is returning  [183]
Apr 08 10:03:30 photon-127d0f3a4fdbmkLv45RdBP vmafdd[772]: t@139629328684800: ERROR! [VecsIpcGetEntryByAlias] is returning  [4312]
Apr 08 10:03:30 photon-127d0f3a4fdbmkLv45RdBP vmafdd[772]: t@139629328684800: VecsSrvFlushMachineSslCertificate returning 2

root@photon-127d0f3a4fdbmkLv45RdBP [ ~ ]# /opt/vmware/sbin/vmafdd -s -l5
root@photon-127d0f3a4fdbmkLv45RdBP [ ~ ]# journalctl -f
-- Logs begin at Tue 2017-03-14 08:01:41 UTC. --
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/authservice.c,36]
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: VMware afd Service registered successfully.
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/rpc.c,576]
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/service.c,207]
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/service.c,144]
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/service.c,46]
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/init.c,118]
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/main.c,82]
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/rpc.c,468]
Apr 08 10:36:05 photon-127d0f3a4fdbmkLv45RdBP vmafdd[1817]: t@140158652147584: [../../../server/vmafd/service.c,99]

pr 08 10:03:39 photon-127d0f3a4fdbmkLv45RdBP systemd[1]: Started Photon Controller Configuration Service.
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: unable to write 'random state'
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: Illegal option:  __MACHINE_CERT
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: keytool -importkeystore [OPTION]...
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: Imports one or all entries from another keystore
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: Options:
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -srckeystore <srckeystore>            source keystore name
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -destkeystore <destkeystore>          destination keystore name
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -srcstoretype <srcstoretype>          source keystore type
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -deststoretype <deststoretype>        destination keystore type
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -srcstorepass <arg>                   source keystore password
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -deststorepass <arg>                  destination keystore password
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -srcprotected                         source keystore password protected
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -srcprovidername <srcprovidername>    source keystore provider name
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -destprovidername <destprovidername>  destination keystore provider name
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -srcalias <srcalias>                  source alias
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -destalias <destalias>                destination alias
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -srckeypass <arg>                     source key password
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -destkeypass <arg>                    destination key password
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -noprompt                             do not prompt
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -providerclass <providerclass>        provider class name
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -providerarg <arg>                    provider argument
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -providerpath <pathlist>              provider classpath
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: -v                                    verbose output
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: Use "keytool -help" for all available commands
Apr 08 10:03:40 photon-127d0f3a4fdbmkLv45RdBP run.sh[1220]: chmod: cannot access '/etc/vmware/photon/keystore.jks': No such file or directory
vChrisR commented 7 years ago

I wrote a blog post on how I installed Photon: http://www.automate-it.today/getting-started-vmware-photon-platform/