infiniband-radar / infiniband-radar-daemon

GNU General Public License v3.0
13 stars 5 forks source link

204 No Content transmitted #5

Open henkela opened 5 years ago

henkela commented 5 years ago

Hi, finally I could set up both the web and the daemon part. But now I see that there's no Content transmitted. Daemon
HTTP Request: '/v2/topologies/ib-test' (j/p/total)30/291/322 ms
gateway_1 | 172.18.0.1 - - [02/Apr/2019:12:20:52 +0000] "PUT /api/v2/topologies/ib-test HTTP/1.1" 204 25 "-" "-" "-"

and after that also HTTP Request: '/v2/metrics/ib-test' (j/p/total)12/159/172 ms

ibnetdiscover, iblinkinfo, etc. is working. Daemon is running as root. Any hints welcome. Best, Andreas

carstenpatzke commented 5 years ago

Hey Andreas, thanks for using the InfiniBand-Radar.

The API server will not send any response payload when the request was successful. If there would be an error the server would send a 500 HTTP Code error.

... so in your case everthing is fine.

henkela commented 5 years ago

Hi Carsten, I'm not sure because the web-app doesn't show anything - I mean data. The topology is not shown and no metrics are reported. I added Verbose for the curl in the source code in the ApiClient.cpp. It seems like the Client cannot retrieve information from infiniband or it's just null.

carstenpatzke commented 5 years ago

Oh ok, thats interesting... Can you add some debug output in InfiniBandRadar.cpp / update_fabric_topology@Line 113? Like std::cout << "Processing node: " << node->nodedesc << std::endl;

Maybe you are right, and the tool cannot detect any Topology :/

henkela commented 5 years ago

Finally, I added that line and saw a lot of output However, first line after starting infiniband_radar_daemon was

src/query_smp.c:197; umad (DR path slid 0; dlid 0; 0,1,1,16,24 Attr 0x11:0) bad status 110; Connection timed out

followed by a long list of

processing node: mlx4_0

and

processing node: Infiniscale-IV Mellanox Technologies

After that there are the following three lines before everything starts over.

ibwarn: [11129] _do_madrpc: recv failed: Connection timed out ibwarn: [11129] mad_rpc: _do_madrpc failed; dport (DR path slid 0; dlid 0; 0) Sending initial topology

Inbetween there are also the metrics-requests like HTTP Request: '/v2/topologies/ib-test' (j/p/total)21/144/165 ms Send topology: 4821ms HTTP Request: '/v2/metrics/ib-test' (j/p/total)18/143/161 ms Send port stats: 653ms

Unfortunately, the web only shows the Button for Fabric 1 and if I click on it there are Hosts: 0

I had a look at the logs of the container infiniband-radar-web_api which showed

TypeError: Cannot read property 'topologyRoot' of null at TopologiesController. (/home/node/server/src/api/v2/controllers/TopologiesController.ts:44:89) at step (/home/node/server/src/api/v2/controllers/TopologiesController.ts:57:23) at Object.next (/home/node/server/src/api/v2/controllers/TopologiesController.ts:38:53) at fulfilled (/home/node/server/src/api/v2/controllers/TopologiesController.ts:29:58) at process._tickCallback (internal/process/next_tick.js:68:7)

The logs of the influxdb container look like

[httpd] 172.18.0.6 - root [08/May/2019:11:52:15 +0000] "POST /write?db=infiniband_radar&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 204 0 "-" "-" b941e705-7187-11e9-8b06-000000000000 89457 [httpd] 172.18.0.6 - root [08/May/2019:11:52:15 +0000] "POST /write?db=infiniband_radar&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 204 0 "-" "-" b950d7fc-7187-11e9-8b07-000000000000 7678

Any other debug available?

carstenpatzke commented 5 years ago

Thanks for all your effort and support.

The current errors (TypeError: Cannot read property 'topologyRoot' of null and the influx error) are all caused by a non existing topology root, which should be created when the daemon starts.

My guess would be that something between the daemon API request and the database fails. You already showed me that the nodes are detected and at least something is send to the right address.

HTTP Request: '/v2/topologies/ib-test' (j/p/total)21/144/165 ms
Send topology: 4821ms

The web-server should write the topology inside the database... so if you want, you can try to open the MongoDB file in the data directory and check if there is a TopologySnapshot (DB: _infinibandradar) that looks like this.

When you restart the web-server there should also be a warning 'ib-test' has never provided a topology! Is the daemon running? when the server cannot find any stored topology.

(PS: I've deleted the duplicated comment)

carstenpatzke commented 4 years ago

Did you manage to solve this issue?

kcgthb commented 4 years ago

Hi Carsten, I'm actually seeing the same exact problem and nothing is displayed in the web interface. When it first starts, the web server logs this:

Log level is: [Debug]
(node:16) ExperimentalWarning: The fs.promises API is experimental
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][MetricDatabase] Created database 'infiniband_radar'
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][MetricDatabase] Creating retention policy 'rp_14d' for database 'infiniband_radar'
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][MetricDatabase] Created retention policy 'rp_14d' for database 'infiniband_radar'
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][MetricDatabase] [edr] Updating global metric
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][TopologyDatabase] Setup 'mongodb://mongodb:27017/infiniband_radar'
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][TopologyDatabase] Update default topologies cache
[Wed, 22 Apr 2020 17:42:17 GMT][WARN][TopologyDatabase] No default timestamps are available! Starting the server for the first time?
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][TopologyDatabase] Fetch last snapshot for 'edr'
[Wed, 22 Apr 2020 17:42:17 GMT][WARN][TopologyDatabase] 'edr' has never provided a topology! Is the daemon running?
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][UserDatabase] Setup 'mongodb://mongodb:27017/infiniband_radar'
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][Server] Database setup complete
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][Server] Server startup complete
[Wed, 22 Apr 2020 17:42:17 GMT][INFO][Server] API Server is listening on http://0.0.0.0:4201/
[Wed, 22 Apr 2020 17:42:32 GMT][INFO][TopologiesController] [edr] Got first topology version
[Wed, 22 Apr 2020 17:42:32 GMT][Debug][ApiServer] Took 71 ms to process (PUT) '/api/v2/topologies/edr' StatusCode: 204
[Wed, 22 Apr 2020 17:42:32 GMT][INFO][TopologyOptimizerService] [edr] Start optimization
[Wed, 22 Apr 2020 17:42:38 GMT][Debug][ApiServer] Took 74 ms to process (PUT) '/api/v2/metrics/edr' StatusCode: 204
[Wed, 22 Apr 2020 17:42:43 GMT][Debug][ApiServer] Took 59 ms to process (PUT) '/api/v2/metrics/edr' StatusCode: 204
[Wed, 22 Apr 2020 17:42:48 GMT][Debug][ApiServer] Took 30 ms to process (PUT) '/api/v2/metrics/edr' StatusCode: 204
[Wed, 22 Apr 2020 17:42:53 GMT][Debug][ApiServer] Took 37 ms to process (PUT) '/api/v2/metrics/edr' StatusCode: 204
[Wed, 22 Apr 2020 17:42:58 GMT][Debug][ApiServer] Took 34 ms to process (PUT) '/api/v2/metrics/edr' StatusCode: 204
[Wed, 22 Apr 2020 17:43:03 GMT][Debug][ApiServer] Took 32 ms to process (PUT) '/api/v2/metrics/edr' StatusCode: 204
[Wed, 22 Apr 2020 17:43:08 GMT][Debug][ApiServer] Took 37 ms to process (PUT) '/api/v2/metrics/edr' StatusCode: 204
[...]

One clue may be that when accessing Grafana, it shows the following warning:

Templating init failed
InfluxDB Error: error parsing query: found \/, expected identifier, string, number, bool at line 1, char 105

Maybe there's an issue with the parsing of some strings?