ibm-research / swifthlm

SwiftHLM - a middleware for using OpenStack Swift with tape and other hight latency media storage backends
Apache License 2.0
14 stars 4 forks source link

'STATUS' request causes an error #6

Closed tommyJin closed 7 years ago

tommyJin commented 7 years ago

Hi @slavisasarafijanovic @hseipp ,

I have encountered a issue about 'status' request. When I execute the 'status' request, an error would occur, but the other requests remain ok.

I'll post the reproduced output and logs below, plz have a check if it is okay for you. Thanks in advance.

Basic info: python: 2.7 swift: 2.5.0(Liberty) swifthlm: 0.2.2 master branch(before 2017/06/20 submit)

create container

[osddev@localhost testresource]$ curl -v -X PUT -H "X-Auth-Token: AUTH_tk0a1dadf766ad48ebb577325e4a305342" http://127.0.0.1:8080/v1/AUTH_test/testcontainer2

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)n/nb

    PUT /v1/AUTH_test/testcontainer2 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk0a1dadf766ad48ebb577325e4a305342

    < HTTP/1.1 201 Created < Content-Length: 0 < Content-Type: text/html; charset=UTF-8 < X-Trans-Id: tx5baef1a05329466695776-00596360f0 < Date: Mon, 10 Jul 2017 11:11:45 GMT <

  • Connection #0 to host 127.0.0.1 left intact

upload file

[osddev@localhost testresource]$ curl -v -X PUT -H "X-Auth-Token: AUTH_tk0a1dadf766ad48ebb577325e4a305342" http://127.0.0.1:8080/v1/AUTH_test/testcontainer2/obj1 -T /home/osddev/testfile.txt

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    PUT /v1/AUTH_test/testcontainer2/obj1 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk0a1dadf766ad48ebb577325e4a305342 Content-Length: 12 Expect: 100-continue

    < HTTP/1.1 100 Continue

  • We are completely uploaded and fine < HTTP/1.1 201 Created < Last-Modified: Mon, 10 Jul 2017 11:29:42 GMT < Content-Length: 0 < Etag: 6f5902ac237024bdd0c176cb93063dc4 < Content-Type: text/html; charset=UTF-8 < X-Trans-Id: tx0fc2be5e036f440dba157-0059636525 < Date: Mon, 10 Jul 2017 11:29:41 GMT <
  • Connection #0 to host 127.0.0.1 left intact

container and object check

[osddev@localhost system]$ curl -v -X GET -H "X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006" http://127.0.0.1:8080/v1/AUTH_test

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /v1/AUTH_test HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006

    < HTTP/1.1 200 OK < X-Account-Storage-Policy-Gold-Bytes-Used: 34671496 < Content-Length: 95 < X-Account-Storage-Policy-Gold-Object-Count: 18 < Accept-Ranges: bytes < X-Account-Object-Count: 21 < X-Account-Storage-Policy-Swiftonfile-Container-Count: 1 < X-Timestamp: 1480498333.02657 < X-Account-Storage-Policy-Swiftonfile-Bytes-Used: 33587224 < X-Account-Storage-Policy-Gold-Container-Count: 6 < X-Account-Storage-Policy-Swiftonfile-Object-Count: 3 < X-Account-Bytes-Used: 68258720 < X-Account-Container-Count: 7 < Content-Type: text/plain; charset=utf-8 < X-Account-Meta-Owner: Hello World < X-Trans-Id: tx7507a7ef3bb74229ae54f-0059636864 < Date: Mon, 10 Jul 2017 11:43:32 GMT < container1 container_test contaniner1 sofContainer testcontainer testcontainer1 testcontainer2

  • Connection #0 to host 127.0.0.1 left intact [osddev@localhost system]$ curl -v -X GET -H "X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006" http://127.0.0.1:8080/v1/AUTH_test/testcontainer2
  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /v1/AUTH_test/testcontainer2 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006

    < HTTP/1.1 200 OK < Content-Length: 5 < X-Container-Object-Count: 1 < Accept-Ranges: bytes < X-Storage-Policy: gold < X-Container-Bytes-Used: 12 < X-Timestamp: 1499685104.99063 < Content-Type: text/plain; charset=utf-8 < X-Trans-Id: txf7cb8caccd0044b39ddc3-005963686d < Date: Mon, 10 Jul 2017 11:43:41 GMT < obj1

  • Connection #0 to host 127.0.0.1 left intact

run 'requests' and 'status' in order

[osddev@localhost system]$ curl -v -X GET -H "X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006" http://127.0.0.1:8080/hlm/v1/requests/AUTH_test/testcontainer2

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /hlm/v1/requests/AUTH_test/testcontainer2 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006

    < HTTP/1.1 200 OK < Content-Length: 53 < Content-Type: text/plain < X-Trans-Id: tx1bc1563b3580435a974a2-0059636ac4 < Date: Mon, 10 Jul 2017 11:53:40 GMT <

  • Connection #0 to host 127.0.0.1 left intact ["There are no pending or failed SwiftHLM requests."][osddev@localhost system]$ curl 9839f552be9006" http://127.0.0.1:8080/hlm/v1/status/AUTH_test/testcontainer2
  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /hlm/v1/status/AUTH_test/testcontainer2 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006

    < HTTP/1.1 500 Internal Error < Content-Length: 17 < Content-Type: text/plain < X-Trans-Id: tx133184cb7e5b4cc8b5c19-0059636acd < Date: Mon, 10 Jul 2017 11:53:50 GMT <

  • Connection #0 to host 127.0.0.1 left intact An error occurred[osddev@localhost system]$

proxy.error

Jul 10 19:53:40 localhost proxy-server: STDERR: (4074) accepted ('127.0.0.1', 58452) Jul 10 19:53:40 localhost proxy-server: STDERR: 127.0.0.1 - - [10/Jul/2017 11:53:40] "GET /hlm/v1/requests/AUTH_test/testcontainer2 HTTP/1.1" 200 203 0.094634 (txn: txac2948548f38475db200e-0059636ac4) Jul 10 19:53:49 localhost proxy-server: STDERR: (4074) accepted ('127.0.0.1', 58463) Jul 10 19:53:50 localhost proxy-server: Error: An error occurred: #012Traceback (most recent call last):#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/catch_errors.py", line 41, in handle_request#012 resp = self._app_call(env)#012 File "/home/osddev/swift-2.5.0/swift/common/wsgi.py", line 1033, in _app_call#012 resp = self.app(env, self._start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/gatekeeper.py", line 90, in call#012 return self.app(env, gatekeeper_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/healthcheck.py", line 57, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/proxy_logging.py", line 337, in call#012 iterable = self.app(env, my_start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/memcache.py", line 109, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/swob.py", line 1418, in _wsgify_self#012 return func(self, Request(env))(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/tempurl.py", line 340, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/ratelimit.py", line 301, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/crossdomain.py", line 82, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/swob.py", line 1418, in _wsgify_self#012 return func(self, Request(env))(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/tempauth.py", line 289, in call#012 return self.app(env, start_response)#012 File "build/bdist.linux-x86_64/egg/swifthlm/middleware.py", line 440, in call#012 self.merge_responses_from_storage_nodes(hlm_req)#012 File "build/bdist.linux-x86_64/egg/swifthlm/middleware.py", line 628, in merge_responses_from_storage_nodes#012 resp_in = (json.loads(self.response_in[ip_addr]))['objects']#012 File "/usr/lib64/python2.7/site-packages/simplejson/init.py", line 516, in loads#012 return _default_decoder.decode(s)#012 File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 374, in decode#012 obj, end = self.raw_decode(s)#012 File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 404, in raw_decode#012 return self.scan_once(s, idx=_w(s, idx).end())#012JSONDecodeError: Expecting value: line 1 column 1 (char 0) (txn: tx40515458898743c6a9587-0059636acd) Jul 10 19:53:50 localhost proxy-server: STDERR: 127.0.0.1 - - [10/Jul/2017 11:53:50] "GET /hlm/v1/status/AUTH_test/testcontainer2 HTTP/1.1" 500 179 0.837648 (txn: tx40515458898743c6a9587-0059636acd)

proxy.log

Jul 10 19:53:40 localhost proxy-server: - - 10/Jul/2017/11/53/40 HEAD /v1/v1 HTTP/1.0 204 - Swift - - - - tx1bc1563b3580435a974a2-0059636ac4 - 0.0086 RL - 1499687620.094608068 1499687620.103219986 - Jul 10 19:53:40 localhost swift: - - 10/Jul/2017/11/53/40 HEAD /v1/AUTH_test/testcontainer2 HTTP/1.0 204 - SwiftHLM%20Middleware - - - - txc590cc1a642d46f1b5547-0059636ac4 - 0.0222 - - 1499687620.104687929 1499687620.126879930 0 Jul 10 19:53:40 localhost swift: - - 10/Jul/2017/11/53/40 GET /v1/.swifthlm/pending-hlm-requests%3Fformat%3Djson%26marker%3D%26end_marker%3D HTTP/1.0 200 - SwiftHLM%20Middleware - - 2 - tx5ea42dc97f4c44b688578-0059636ac4 - 0.0048 - - 1499687620.127410889 1499687620.132189989 0 Jul 10 19:53:40 localhost swift: - - 10/Jul/2017/11/53/40 GET /v1/.swifthlm/failed-hlm-requests%3Fformat%3Djson%26marker%3D%26end_marker%3D HTTP/1.0 200 - SwiftHLM%20Middleware - - 5933 - txbfa85a74649f4892aa26e-0059636ac4 - 0.0231 - - 1499687620.132649899 1499687620.155777931 0 Jul 10 19:53:40 localhost swift: - - 10/Jul/2017/11/53/40 GET /v1/.swifthlm/failed-hlm-requests%3Fformat%3Djson%26marker%3D20170710104143.069--migrate--AUTH_test--testcontainer1--0%26end_marker%3D HTTP/1.0 200 - SwiftHLM%20Middleware - - 2 - txac2948548f38475db200e-0059636ac4 - 0.0313 - - 1499687620.156425953 1499687620.187700033 0 Jul 10 19:53:40 localhost proxy-server: 127.0.0.1 127.0.0.1 10/Jul/2017/11/53/40 GET /hlm/v1/requests/AUTH_test/testcontainer2 HTTP/1.0 200 - curl/7.29.0 AUTH_tk35e7dc30a... - 53 - tx1bc1563b3580435a974a2-0059636ac4 - 0.0942 - - 1499687620.093960047 1499687620.188185930 - Jul 10 19:53:49 localhost proxy-server: - - 10/Jul/2017/11/53/49 HEAD /v1/v1 HTTP/1.0 204 - Swift - - - - tx133184cb7e5b4cc8b5c19-0059636acd - 0.0086 RL - 1499687629.855142117 1499687629.863786936 - Jul 10 19:53:49 localhost swift: - - 10/Jul/2017/11/53/49 HEAD /v1/AUTH_test/testcontainer2 HTTP/1.0 204 - SwiftHLM%20Middleware - - - - tx2beb93ed4e4e484b8e2bf-0059636acd - 0.0081 - - 1499687629.864785910 1499687629.872857094 0 Jul 10 19:53:49 localhost swift: - - 10/Jul/2017/11/53/49 GET /v1/AUTH_test/testcontainer2%3Fformat%3Djson%26marker%3D%26end_marker%3D HTTP/1.0 200 - SwiftHLM%20Middleware - - 166 - tx468160506238491cb3481-0059636acd - 0.0053 - - 1499687629.873548985 1499687629.878866911 0 Jul 10 19:53:49 localhost swift: - - 10/Jul/2017/11/53/49 GET /v1/AUTH_test/testcontainer2%3Fformat%3Djson%26marker%3Dobj1%26end_marker%3D HTTP/1.0 200 - SwiftHLM%20Middleware - - 2 - txb653a24e8b8c402aad8f7-0059636acd - 0.0050 - - 1499687629.879329920 1499687629.884301901 0 Jul 10 19:53:49 localhost proxy-server: - - 10/Jul/2017/11/53/49 HEAD /v1/AUTH_test HTTP/1.0 204 - Swift - - - - tx5b201be7c5a343e4bacc6-0059636acd - 0.0034 GET_INFO - 1499687629.884717941 1499687629.888124943 - Jul 10 19:53:49 localhost proxy-server: - - 10/Jul/2017/11/53/49 HEAD /v1/AUTH_test/testcontainer2 HTTP/1.0 204 - Swift - - - - tx40515458898743c6a9587-0059636acd - 0.0046 LE - 1499687629.888813019 1499687629.893404961 0 Jul 10 19:53:50 localhost proxy-server: 127.0.0.1 127.0.0.1 10/Jul/2017/11/53/50 GET /hlm/v1/status/AUTH_test/testcontainer2 HTTP/1.0 500 - curl/7.29.0 AUTH_tk35e7dc30a... - - - tx133184cb7e5b4cc8b5c19-0059636acd - 0.7790 - - 1499687629.854521990 1499687630.633524895 -

hlm.log (set log_level = INFO )

Jul 10 19:37:17 localhost hlm-dispatcher: 425 [middleware.py: init():244] info: Initialized SwiftHLM Middleware Jul 10 19:37:17 localhost hlm-dispatcher: 665 [middleware.py: init():244] info: Initialized SwiftHLM Middleware Jul 10 19:38:00 localhost hlm-dispatcher: 474 [middleware.py: init():244] info: Initialized SwiftHLM Middleware Jul 10 19:38:00 localhost hlm-dispatcher: 474 [dispatcher.py: init():101] info: Initialized Dispatcher

run 'migrate' then 'status'

[osddev@localhost system]$ curl -v -X POST -H "X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006" http://127.0.0.1:8080/hlm/v1/migrate/AUTH_test/testcontainer2

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    POST /hlm/v1/migrate/AUTH_test/testcontainer2 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006

    < HTTP/1.1 200 OK < Content-Length: 26 < Content-Type: text/plain < X-Trans-Id: tx2c6eaee3c61f442d9f73e-0059636bcb < Date: Mon, 10 Jul 2017 11:58:03 GMT < Accepted migrate request.

  • Connection #0 to host 127.0.0.1 left intact [osddev@localhost system]$ curl -v -X GET -H "X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006" http://127.0.0.1:8080/hlm/v1/status/AUTH_test/testcontainer2
  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /hlm/v1/status/AUTH_test/testcontainer2 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006

    < HTTP/1.1 500 Internal Error < Content-Length: 17 < Content-Type: text/plain < X-Trans-Id: tx2e82d0f7672c4c998d48b-0059636bd5 < Date: Mon, 10 Jul 2017 11:58:13 GMT <

  • Connection #0 to host 127.0.0.1 left intact An error occurred[osddev@localhost system]$

No changes in hlm.log, and the same log in proxy.log. But the proxy.error changes, below is the new pended proxy.error

Jul 10 19:58:03 localhost proxy-server: STDERR: (4074) accepted ('127.0.0.1', 58488) Jul 10 19:58:03 localhost proxy-server: STDERR: 127.0.0.1 - - [10/Jul/2017 11:58:03] "POST /hlm/v1/migrate/AUTH_test/testcontainer2 HTTP/1.1" 200 176 0.051441 (txn: txe15a41c6cddf48cd9a0ed-0059636bcb) Jul 10 19:58:13 localhost proxy-server: STDERR: (4074) accepted ('127.0.0.1', 58500) Jul 10 19:58:13 localhost proxy-server: Error: An error occurred: #012Traceback (most recent call last):#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/catch_errors.py", line 41, in handle_request#012 resp = self._app_call(env)#012 File "/home/osddev/swift-2.5.0/swift/common/wsgi.py", line 1033, in _app_call#012 resp = self.app(env, self._start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/gatekeeper.py", line 90, in call#012 return self.app(env, gatekeeper_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/healthcheck.py", line 57, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/proxy_logging.py", line 337, in call#012 iterable = self.app(env, my_start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/memcache.py", line 109, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/swob.py", line 1418, in _wsgify_self#012 return func(self, Request(env))(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/tempurl.py", line 340, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/ratelimit.py", line 301, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/crossdomain.py", line 82, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/swob.py", line 1418, in _wsgify_self#012 return func(self, Request(env))(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/tempauth.py", line 289, in call#012 return self.app(env, start_response)#012 File "build/bdist.linux-x86_64/egg/swifthlm/middleware.py", line 440, in call#012 self.merge_responses_from_storage_nodes(hlm_req)#012 File "build/bdist.linux-x86_64/egg/swifthlm/middleware.py", line 628, in merge_responses_from_storage_nodes#012 resp_in = (json.loads(self.response_in[ip_addr]))['objects']#012 File "/usr/lib64/python2.7/site-packages/simplejson/init.py", line 516, in loads#012 return _default_decoder.decode(s)#012 File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 374, in decode#012 obj, end = self.raw_decode(s)#012 File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 404, in raw_decode#012 return self.scan_once(s, idx=_w(s, idx).end())#012JSONDecodeError: Expecting value: line 1 column 1 (char 0) (txn: txcf931ace737c4956baaed-0059636bd5) Jul 10 19:58:13 localhost proxy-server: STDERR: 127.0.0.1 - - [10/Jul/2017 11:58:13] "GET /hlm/v1/status/AUTH_test/testcontainer2 HTTP/1.1" 500 179 0.692756 (txn: txcf931ace737c4956baaed-0059636bd5)

Run 'requests' and 'status' in order

[osddev@localhost system]$ curl -v -X GET -H "X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006" http://127.0.0.1:8080/hlm/v1/requests/AUTH_test/testcontainer2

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /hlm/v1/requests/AUTH_test/testcontainer2 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006

    < HTTP/1.1 200 OK < Content-Length: 69 < Content-Type: text/plain < X-Trans-Id: tx5b62ec9fd15940439201b-0059636c6b < Date: Mon, 10 Jul 2017 12:00:43 GMT <

  • Connection #0 to host 127.0.0.1 left intact ["20170710115803.822--migrate--AUTH_test--testcontainer2--0--failed"][osddev@localhos9839f552be9006" http://127.0.0.1:8080/hlm/v1/status/AUTH_test/testcontainer2
  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /hlm/v1/status/AUTH_test/testcontainer2 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk35e7dc30a01f43ef8d9839f552be9006

    < HTTP/1.1 500 Internal Error < Content-Length: 17 < Content-Type: text/plain < X-Trans-Id: txa7ca1fa5506c41a3973a3-0059636c70 < Date: Mon, 10 Jul 2017 12:00:49 GMT <

  • Connection #0 to host 127.0.0.1 left intact An error occurred[osddev@localhost system]$

and the new log in proxy.error, the hlm.log remains the same.

Jul 10 20:00:43 localhost proxy-server: STDERR: (4074) accepted ('127.0.0.1', 58539) Jul 10 20:00:43 localhost proxy-server: STDERR: 127.0.0.1 - - [10/Jul/2017 12:00:43] "GET /hlm/v1/requests/AUTH_test/testcontainer2 HTTP/1.1" 200 219 0.065690 (txn: tx66c96638200e4850baf95-0059636c6b) Jul 10 20:00:48 localhost proxy-server: STDERR: (4074) accepted ('127.0.0.1', 58549) Jul 10 20:00:49 localhost proxy-server: Error: An error occurred: #012Traceback (most recent call last):#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/catch_errors.py", line 41, in handle_request#012 resp = self._app_call(env)#012 File "/home/osddev/swift-2.5.0/swift/common/wsgi.py", line 1033, in _app_call#012 resp = self.app(env, self._start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/gatekeeper.py", line 90, in call#012 return self.app(env, gatekeeper_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/healthcheck.py", line 57, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/proxy_logging.py", line 337, in call#012 iterable = self.app(env, my_start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/memcache.py", line 109, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/swob.py", line 1418, in _wsgify_self#012 return func(self, Request(env))(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/tempurl.py", line 340, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/ratelimit.py", line 301, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/crossdomain.py", line 82, in call#012 return self.app(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/swob.py", line 1418, in _wsgify_self#012 return func(self, Request(env))(env, start_response)#012 File "/home/osddev/swift-2.5.0/swift/common/middleware/tempauth.py", line 289, in call#012 return self.app(env, start_response)#012 File "build/bdist.linux-x86_64/egg/swifthlm/middleware.py", line 440, in call#012 self.merge_responses_from_storage_nodes(hlm_req)#012 File "build/bdist.linux-x86_64/egg/swifthlm/middleware.py", line 628, in merge_responses_from_storage_nodes#012 resp_in = (json.loads(self.response_in[ip_addr]))['objects']#012 File "/usr/lib64/python2.7/site-packages/simplejson/init.py", line 516, in loads#012 return _default_decoder.decode(s)#012 File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 374, in decode#012 obj, end = self.raw_decode(s)#012 File "/usr/lib64/python2.7/site-packages/simplejson/decoder.py", line 404, in raw_decode#012 return self.scan_once(s, idx=_w(s, idx).end())#012JSONDecodeError: Expecting value: line 1 column 1 (char 0) (txn: txfc68275e1bcb46d3b0e22-0059636c70) Jul 10 20:00:49 localhost proxy-server: STDERR: 127.0.0.1 - - [10/Jul/2017 12:00:49] "GET /hlm/v1/status/AUTH_test/testcontainer2 HTTP/1.1" 500 179 0.691834 (txn: txfc68275e1bcb46d3b0e22-0059636c70)

I reproduce the issue on 2 different What logs or information else shall I provide about this issue? Thanks.

Best regards, Tommy

tommyJin commented 7 years ago

Hi @slavisasarafijanovic @hseipp

I trace and print out some messages and find something.

This is how I recreate the error: 1. upload a file

[root@localhost testresource]# curl -v -X PUT -H 'X-Auth-Token: AUTH_tk4e8b1e76e0f44050bc8e5ce5ee4fb1f8' http://127.0.0.1:8080/v1/AUTH_test/con3/obj0 -T /home/osddev/testfile.txt

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    PUT /v1/AUTH_test/con3/obj0 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk4e8b1e76e0f44050bc8e5ce5ee4fb1f8 Content-Length: 12 Expect: 100-continue

    < HTTP/1.1 100 Continue

  • We are completely uploaded and fine < HTTP/1.1 201 Created < Last-Modified: Mon, 17 Jul 2017 11:37:11 GMT < Content-Length: 0 < Etag: 6f5902ac237024bdd0c176cb93063dc4 < Content-Type: text/html; charset=UTF-8 < X-Trans-Id: txd9dc06584bfb455fa75f4-00596ca166 < Date: Mon, 17 Jul 2017 11:37:10 GMT <
  • Connection #0 to host 127.0.0.1 left intact

testcontainer1 objects [root@localhost testresource]# curl -v -X GET -H 'X-Auth-Token: AUTH_tk4e8b1e76e0f44050bc8e5ce5ee4fb1f8' http://127.0.0.1:8080/v1/AUTH_test/con3

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /v1/AUTH_test/con3 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk4e8b1e76e0f44050bc8e5ce5ee4fb1f8

    < HTTP/1.1 200 OK < Content-Length: 5 < X-Container-Object-Count: 1 < Accept-Ranges: bytes < X-Storage-Policy: gold < X-Container-Bytes-Used: 12 < X-Timestamp: 1500291340.89569 < Content-Type: text/plain; charset=utf-8 < X-Trans-Id: tx9292c7186cac45a1a0e21-00596ca190 < Date: Mon, 17 Jul 2017 11:37:52 GMT < obj0

  • Connection #0 to host 127.0.0.1 left intact

from database

ROWID name created_at size content_type etag deleted storage_policy_index "1","obj0","1500291430.66738","12","application/octet-stream","6f5902ac237024bdd0c176cb93063dc4","0","0"

status of new object 'obj0'

[root@localhost testresource]# curl -v -X GET -H 'X-Auth-Token: AUTH_tk4e8b1e76e0f44050bc8e5ce5ee4fb1f8' http://127.0.0.1:8080/hlm/v1/status/AUTH_test/con3/obj0

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /hlm/v1/status/AUTH_test/con3/obj0 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk4e8b1e76e0f44050bc8e5ce5ee4fb1f8

    < HTTP/1.1 500 Internal Error < Content-Length: 17 < Content-Type: text/plain < X-Trans-Id: tx173bef469efe4deb9d38c-00596ca1f1 < Date: Mon, 17 Jul 2017 11:39:30 GMT <

  • Connection #0 to host 127.0.0.1 left intact An error occurred[root@localhost testresource]#

error in hlm.error

Jul 17 19:37:47 localhost hlm-handler: 898 [handler.py:map_objects_to_targets():212] Unavailable device: sdb4, for object: /AUTH_test/con3/obj0,storage policy: StoragePolicy(0, 'gold', is_default=True, is_deprecated=False, policy_type='replication') Jul 17 19:37:48 localhost hlm-dispatchermiddleware: 506 [middleware.py:submit_request_to_storage_node_and_get_response():579] Errors reported by Handler: Traceback (most recent call last):#012 File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main#012 "main", fname, loader, pkg_name)#012 File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code#012 exec code in run_globals#012 File "build/bdist.linux-x86_64/egg/swifthlm/handler.py", line 284, in #012 File "build/bdist.linux-x86_64/egg/swifthlm/handler.py", line 213, in map_objects_to_targets#012AttributeError: 'ObjectController' object has no attribute 'disk_file'#012 Jul 17 19:39:30 localhost hlm-handler: 114 [handler.py:map_objects_to_targets():212] Unavailable device: sdb4, for object: /AUTH_test/con3/obj0,storage policy: StoragePolicy(0, 'gold', is_default=True, is_deprecated=False, policy_type='replication') Jul 17 19:39:30 localhost hlm-dispatchermiddleware: 236 [middleware.py:submit_request_to_storage_node_and_get_response():579] Errors reported by Handler: Traceback (most recent call last):#012 File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main#012 "main", fname, loader, pkg_name)#012 File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code#012 exec code in run_globals#012 File "build/bdist.linux-x86_64/egg/swifthlm/handler.py", line 284, in #012 File "build/bdist.linux-x86_64/egg/swifthlm/handler.py", line 213, in map_objects_to_targets#012AttributeError: 'ObjectController' object has no attribute 'disk_file'#012

hlm.log(I add some comments so the line number would not be as the same as the current project shows)

Jul 17 19:39:30 localhost hlm-handler: 112 [handler.py: receive_request():150] Receiving request from Dispatcher Jul 17 19:39:30 localhost hlm-handler: 112 [handler.py:map_objects_to_targets():159] Mapping objects to files Jul 17 19:39:30 localhost hlm-handler: 112 [handler.py:map_objects_to_targets():161] request_in(first 1024 bytes): {"swift_dir": "/etc/swift", "storage_policy_index": "0", "objects": [{"device": "sdb4", "object": "/AUTH_test/con3/obj0"}, {"device": "sdb3", "object": "/AUTH_test/con3/obj0"}, {"device": "sdb2", "object": "/AUTH_test/con3/obj0"}], "request": "status"} Jul 17 19:39:30 localhost hlm-handler: 112 [handler.py:map_objects_to_targets():167] request_in_dict objects=[{'device': 'sdb4', 'object': '/AUTH_test/con3/obj0'}, {'device': 'sdb3', 'object': '/AUTH_test/con3/obj0'}, {'device': 'sdb2', 'object': '/AUTH_test/con3/obj0'}] Jul 17 19:39:30 localhost hlm-handler: 112 [handler.py:map_objects_to_targets():172] obj: {'device': 'sdb4', 'object': '/AUTH_test/con3/obj0'} Jul 17 19:39:30 localhost hlm-handler: 113 [handler.py:map_objects_to_targets():193] Storage nodes: [{'index': 0, 'replication_port': 6040, 'weight': 1.0, 'zone': 4, 'ip': '127.0.0.1', 'region': 1, 'id': 2, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb4', 'port': 6040}, {'index': 1, 'replication_port': 6030, 'weight': 1.0, 'zone': 3, 'ip': '127.0.0.1', 'region': 1, 'id': 3, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb3', 'port': 6030}, {'index': 2, 'replication_port': 6020, 'weight': 1.0, 'zone': 2, 'ip': '127.0.0.1', 'region': 1, 'id': 1, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb2', 'port': 6020}] Jul 17 19:39:30 localhost hlm-handler: 113 [handler.py:map_objects_to_targets():194] partition: 531 Jul 17 19:39:30 localhost hlm-handler: 113 [handler.py:map_objects_to_targets():198] hash_path or key: 84e90445794d21d6a5590b617c80f024 Jul 17 19:39:30 localhost hlm-handler: 113 [handler.py:map_objects_to_targets():202] oc.node_timeout: 3.0 Jul 17 19:39:30 localhost hlm-handler: 114 [handler.py:map_objects_to_targets():204] policy: StoragePolicy(0, 'gold', is_default=True, is_deprecated=False, policy_type='replication') index: 0 Jul 17 19:39:30 localhost hlm-handler: 114 [handler.py:map_objects_to_targets():206] sdb4,531,AUTH_test,con3,obj0,StoragePolicy(0, 'gold', is_default=True, is_deprecated=False, policy_type='replication') Jul 17 19:39:30 localhost hlm-handler: 114 [handler.py:map_objects_to_targets():212] Unavailable device: sdb4, for object: /AUTH_test/con3/obj0,storage policy: StoragePolicy(0, 'gold', is_default=True, is_deprecated=False, policy_type='replication') Jul 17 19:39:30 localhost hlm-dispatchermiddleware: 236 [middleware.py:submit_request_to_storage_node_and_get_response():579] Errors reported by Handler: Traceback (most recent call last):#012 File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main#012 "main", fname, loader, pkg_name)#012 File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code#012 exec code in run_globals#012 File "build/bdist.linux-x86_64/egg/swifthlm/handler.py", line 284, in #012 File "build/bdist.linux-x86_64/egg/swifthlm/handler.py", line 213, in map_objects_to_targets#012AttributeError: 'ObjectController' object has no attribute 'disk_file'#012 Jul 17 19:39:30 localhost hlm-dispatchermiddleware: 254 [middleware.py:submit_request_to_storage_node_and_get_response():595] response= Jul 17 19:39:30 localhost hlm-dispatchermiddleware: 255 [middleware.py:merge_responses_from_storage_nodes():627] Enter merge_responses_from_storage_nodes, hlm_req=status (txn: txe7ae8b43b3834fe6878d0-00596ca1f1) Jul 17 19:39:30 localhost hlm-dispatchermiddleware: 255 [middleware.py:merge_responses_from_storage_nodes():631] self.response_in=defaultdict(<type 'list'>, {'127.0.0.1': ''}) (txn: txe7ae8b43b3834fe6878d0-00596ca1f1)

The related codes are (in handler.py line 203) try: oc.disk_file = oc.get_diskfile( device, partition, account, container, obj, policy=policy) except DiskFileDeviceUnavailable: #scor self.logger.error("Unavailable device: %s, for object: %s," "storage policy: %s", device, obj_and_dev['object'], policy)

Actually, the device is available, and the object file exists

[root@localhost testresource]# cd /srv/4/node/sdb4/objects/531/024/84e90445794d21d6a5590b617c80f024/ [root@localhost 84e90445794d21d6a5590b617c80f024]# ll total 8 -rw-------. 1 osddev osddev 12 Jul 17 19:37 1500291430.66738.data [root@localhost 84e90445794d21d6a5590b617c80f024]#

Hope this could help with the issue.

Regards, Tommy

tommyJin commented 7 years ago

Hi there,

I'd like to know have you created a object-server.conf under /etc/swift? If so, could you paste the content of it? Cause it seems that 'handler' would like to read it.(It won't read the other object-server config files under /etc/swift/object-server/)

so I modify the code in handler.py(line 78) from configFile = r'/etc/swift/object-server.conf to configFile = r'/etc/swift/object-server

If the handler reads from all config files under /etc/swift/object-server , the code at line 82 would throw

[Errno 21] Is a directory: /etc/swift/object-server

Regards, Tommy

slavisasarafijanovic commented 7 years ago

Hi Tommy,

Thank you for investigating this and providing the additional logs and info.

First, regarding your questions: 1) Yes, object-server.conf should be under /etc/swift 2) On my system I currently use a tape backend instead of the dummy backend, but the rest of configuration should be relevant/representative. The content of this file on my system is:

$ cat /etc/swift/object-server.conf                                         
[DEFAULT]                                                                                     
# bind_ip = 0.0.0.0                                                                           
bind_port = 6000                                                                              
# bind_timeout = 30                                                                           
# backlog = 4096                                                                              
# user = swift                                                                                
# swift_dir = /etc/swift                                                                      
# devices = /srv/node                                                                         
# mount_check = true                                                                          
# disable_fallocate = false                                                                   
# expiring_objects_container_divisor = 86400                                                  
# expiring_objects_account_name = expiring_objects                                            
#                                                                                             
# Use an integer to override the number of pre-forked processes that will                     
# accept connections.                                                                         
# workers = auto                                                                              
#                                                                                             
# Maximum concurrent requests per worker                                                      
# max_clients = 1024                                                                          
#                                                                                             
# You can specify default log routing here if you want:                                       
# log_name = swift                                                                            
# log_facility = LOG_LOCAL0                                                                   
# log_level = INFO                                                                            
# log_address = /dev/log                                                                      
# The following caps the length of log lines to the value given; no limit if                  
# set to 0, the default.                                                                      
# log_max_line_length = 0                                                                     
#                                                                                             
# comma separated list of functions to call to setup custom log handlers.                     
# functions get passed: conf, name, log_to_console, log_route, fmt, logger,                   
# adapted_logger                                                                              
# log_custom_handlers =                                                                       
#                                                                                             
# If set, log_udp_host will override log_address                                              
# log_udp_host =                                                                              
# log_udp_port = 514                                                                          
#                                                                                             
# You can enable StatsD logging here:                                                         
# log_statsd_host = localhost                                                                 
# log_statsd_port = 8125                                                                      
# log_statsd_default_sample_rate = 1.0                                                        
# log_statsd_sample_rate_factor = 1.0                                                         
# log_statsd_metric_prefix =                                                                  
#                                                                                             
# eventlet_debug = false                                                                      
#                                                                                             
# You can set fallocate_reserve to the number of bytes you'd like fallocate to                
# reserve, whether there is space for the given file size or not.                             
# fallocate_reserve = 0                                                                       
#                                                                                             
# Time to wait while attempting to connect to another backend node.                           
# conn_timeout = 0.5                                                                          
# Time to wait while sending each chunk of data to another backend node.                      
# node_timeout = 3                                                                            
# Time to wait while receiving each chunk of data from a client or another                    
# backend node.                                                                               
# client_timeout = 60                                                                         
#                                                                                             
# network_chunk_size = 65536                                                                  
# disk_chunk_size = 65536                                                                     

[pipeline:main]
pipeline = healthcheck recon object-server

[app:object-server]
use = egg:swift#object
# You can override the default log routing for this app here:
# set log_name = object-server                               
# set log_facility = LOG_LOCAL0                              
# set log_level = INFO                                       
# set log_requests = true                                    
# set log_address = /dev/log                                 
#                                                            
# max_upload_time = 86400                                    
# slow = 0                                                   
#                                                            
# Objects smaller than this are not evicted from the buffercache once read
# keep_cache_size = 5242880                                               
#                                                                         
# If true, objects for authenticated GET requests may be kept in buffer cache
# if small enough                                                            
# keep_cache_private = false                                                 
#                                                                            
# on PUTs, sync data every n MB                                              
# mb_per_sync = 512                                                          
#                                                                            
# Comma separated list of headers that can be set in metadata on an object.  
# This list is in addition to X-Object-Meta-* headers and cannot include     
# Content-Type, etag, Content-Length, or deleted                             
# allowed_headers = Content-Disposition, Content-Encoding, X-Delete-At, X-Object-Manifest, X-Static-Large-Object
#                                                                                                               
# auto_create_account_prefix = .                                                                                
#                                                                                                               
# A value of 0 means "don't use thread pools". A reasonable starting point is                                   
# 4.                                                                                                            
# threads_per_disk = 0                                                                                          
#                                                                                                               
# Configure parameter for creating specific server                                                              
# To handle all verbs, including replication verbs, do not specify                                              
# "replication_server" (this is the default). To only handle replication,                                       
# set to a True value (e.g. "True" or "1"). To handle only non-replication                                      
# verbs, set to "False". Unless you have a separate replication network, you                                    
# should not specify any value for "replication_server".                                                        
# replication_server = false                                                                                    
#                                                                                                               
# Set to restrict the number of concurrent incoming REPLICATION requests                                        
# Set to 0 for unlimited                                                                                        
# Note that REPLICATION is currently an ssync only item                                                         
# replication_concurrency = 4                                                                                   
#                                                                                                               
# Restricts incoming REPLICATION requests to one per device,                                                    
# replication_currency above allowing. This can help control I/O to each                                        
# device, but you may wish to set this to False to allow multiple REPLICATION                                   
# requests (up to the above replication_concurrency setting) per device.                                        
# replication_one_per_device = True                                                                             
#                                                                                                               
# Number of seconds to wait for an existing replication device lock before                                      
# giving up.                                                                                                    
# replication_lock_timeout = 15                                                                                 
#                                                                                                               
# These next two settings control when the REPLICATION subrequest handler will                                  
# abort an incoming REPLICATION attempt. An abort will occur if there are at                                    
# least threshold number of failures and the value of failures / successes                                      
# exceeds the ratio. The defaults of 100 and 1.0 means that at least 100                                        
# failures have to occur and there have to be more failures than successes for                                  
# an abort to occur.                                                                                            
# replication_failure_threshold = 100                                                                           
# replication_failure_ratio = 1.0                                                                               
#                                                                                                               
# Use splice() for zero-copy object GETs. This requires Linux kernel                                            
# version 3.0 or greater. If you set "splice = yes" but the kernel                                              
# does not support it, error messages will appear in the object server                                          
# logs at startup, but your object servers should continue to function.                                         
#                                                                                                               
# splice = no                                                                                                   

[filter:healthcheck]
use = egg:swift#healthcheck
# An optional filesystem path, which if present, will cause the healthcheck
# URL to return "503 Service Unavailable" with a body of "DISABLED BY FILE"
# disable_path =                                                           

[filter:recon]
use = egg:swift#recon
#recon_cache_path = /var/cache/swift
#recon_lock_path = /var/lock        

[object-replicator]
# You can override the default log routing for this app here (don't use set!):
# log_name = object-replicator                                                
# log_facility = LOG_LOCAL0                                                   
# log_level = INFO                                                            
# log_address = /dev/log                                                      
#                                                                             
# vm_test_mode = no                                                           
# daemonize = on                                                              
# run_pause = 30                                                              
# concurrency = 1                                                             
# stats_interval = 300                                                        
#                                                                             
# The sync method to use; default is rsync but you can use ssync to try the   
# EXPERIMENTAL all-swift-code-no-rsync-callouts method. Once ssync is verified
# as having performance comparable to, or better than, rsync, we plan to      
# deprecate rsync so we can move on with more features for replication.       
# sync_method = rsync                                                         
#                                                                             
# max duration of a partition rsync                                           
# rsync_timeout = 900                                                         
#                                                                             
# bandwidth limit for rsync in kB/s. 0 means unlimited                        
# rsync_bwlimit = 0                                                           
#                                                                             
# passed to rsync for io op timeout                                           
# rsync_io_timeout = 30                                                       
#                                                                             
# node_timeout = <whatever's in the DEFAULT section or 10>                    
# max duration of an http request; this is for REPLICATE finalization calls and
# so should be longer than node_timeout                                        
# http_timeout = 60                                                            
#                                                                              
# attempts to kill all workers if nothing replicates for lockup_timeout seconds
# lockup_timeout = 1800                                                        
#                                                                              
# The replicator also performs reclamation                                     
# reclaim_age = 604800                                                         
#                                                                              
# ring_check_interval = 15                                                     
# recon_cache_path = /var/cache/swift                                          
#                                                                              
# limits how long rsync error log lines are                                    
# 0 means to log the entire line                                               
# rsync_error_log_line_length = 0                                              
#                                                                              
# handoffs_first and handoff_delete are options for a special case             
# such as disk full in the cluster. These two options SHOULD NOT BE            
# CHANGED, except for such an extreme situations. (e.g. disks filled up        
# or are about to fill up. Anyway, DO NOT let your drives fill up)             
# handoffs_first is the flag to replicate handoffs prior to canonical          
# partitions. It allows to force syncing and deleting handoffs quickly.        
# If set to a True value(e.g. "True" or "1"), partitions                       
# that are not supposed to be on the node will be replicated first.            
# handoffs_first = False                                                       
#                                                                              
# handoff_delete is the number of replicas which are ensured in swift.         
# If the number less than the number of replicas is set, object-replicator     
# could delete local handoffs even if all replicas are not ensured in the      
# cluster. Object-replicator would remove local handoff partition directories  
# after syncing partition when the number of successful responses is greater   
# than or equal to this number. By default(auto), handoff partitions will be   
# removed  when it has successfully replicated to all the canonical nodes.     
# handoff_delete = auto                                                        

[object-reconstructor]
# You can override the default log routing for this app here (don't use set!):
# Unless otherwise noted, each setting below has the same meaning as described
# in the [object-replicator] section, however these settings apply to the EC  
# reconstructor                                                               
#                                                                             
# log_name = object-reconstructor                                             
# log_facility = LOG_LOCAL0                                                   
# log_level = INFO                                                            
# log_address = /dev/log                                                      
#                                                                             
# daemonize = on                                                              
# run_pause = 30                                                              
# concurrency = 1                                                             
# stats_interval = 300                                                        
# node_timeout = 10                                                           
# http_timeout = 60                                                           
# lockup_timeout = 1800                                                       
# reclaim_age = 604800                                                        
# ring_check_interval = 15                                                    
# recon_cache_path = /var/cache/swift                                         
# handoffs_first = False                                                      

[object-updater]
# You can override the default log routing for this app here (don't use set!):
# log_name = object-updater                                                   
# log_facility = LOG_LOCAL0                                                   
# log_level = INFO                                                            
# log_address = /dev/log                                                      
#                                                                             
# interval = 300                                                              
# concurrency = 1                                                             
# node_timeout = <whatever's in the DEFAULT section or 10>                    
# slowdown will sleep that amount between objects                             
# slowdown = 0.01                                                             
#                                                                             
# recon_cache_path = /var/cache/swift                                         

[object-auditor]
# You can override the default log routing for this app here (don't use set!):
# log_name = object-auditor                                                   
# log_facility = LOG_LOCAL0                                                   
# log_level = INFO                                                            
# log_address = /dev/log                                                      
#                                                                             
# You can set the disk chunk size that the auditor uses making it larger if   
# you like for more efficient local auditing of larger objects                
# disk_chunk_size = 65536                                                     
# files_per_second = 20                                                       
# concurrency = 1                                                             
# bytes_per_second = 10000000                                                 
# log_time = 3600                                                             
# zero_byte_files_per_second = 50                                             
# recon_cache_path = /var/cache/swift                                         

# Takes a comma separated list of ints. If set, the object auditor will
# increment a counter for every object whose size is <= to the given break
# points and report the result after a full scan.                         
# object_size_stats =                                                     

# Note: Put it at the beginning of the pipleline to profile all middleware. But
# it is safer to put this after healthcheck.                                   
[filter:xprofile]                                                              
use = egg:swift#xprofile                                                       
# This option enable you to switch profilers which should inherit from python  
# standard profiler. Currently the supported value can be 'cProfile',          
# 'eventlet.green.profile' etc.                                                
# profile_module = eventlet.green.profile                                      
#                                                                              
# This prefix will be used to combine process ID and timestamp to name the     
# profile data file.  Make sure the executing user has permission to write     
# into this path (missing path segments will be created, if necessary).        
# If you enable profiling in more than one type of daemon, you must override   
# it with an unique value like: /var/log/swift/profile/object.profile          
# log_filename_prefix = /tmp/log/swift/profile/default.profile                 
#                                                                              
# the profile data will be dumped to local disk based on above naming rule     
# in this interval.                                                            
# dump_interval = 5.0                                                          
#                                                                              
# Be careful, this option will enable profiler to dump data into the file with 
# time stamp which means there will be lots of files piled up in the directory.
# dump_timestamp = false                                                       
#                                                                              
# This is the path of the URL to access the mini web UI.                       
# path = /__profile__                                                          
#                                                                              
# Clear the data when the wsgi server shutdown.                                
# flush_at_shutdown = false                                                    
#                                                                              
# unwind the iterator of applications                                          
# unwind = false                                                               
#                                                                              
### High latency media (hlm) configuration on storage node                     
[hlm]                                                                          
## You can override the default log level here:
#set log_level = INFO
set log_level = DEBUG
## SwiftHLM Connector (and consequently the Backend) is declared here:
#
# Dummy Connector/Backend - used by default if no other connector is defined
#swifthlm_connector_module = swifthlm.dummy_connector
#
# Generic Backend Interface (GBI) options
#The next option defaul value is False, set it to True only if your backend
#connector is able to (more efficiently) map object data directory paths to
#file paths
#gbi_provide_dirpaths_instead_of_filepaths = False
#
# IBM Connector/Backend
swifthlm_connector_module = swifthlmibmsa.ibmsa_swifthlm_connector
#
# Your own Connector/Backend
# Define EITHER connector_module (if installed as a python module), e.g.:
#swifthlm_connector_module = swifthlmibmsa.ibmsa_swifthlm_connector
# OR connector_dir and connector_filename (if installed that way), e.g.:
#swifthlm_connector_dir = /opt/ibm/swifthlmconnector
#swifthlm_connector_filename = connector.py

# DISCLAIMER: availability or not availability the connectors used in the
# above examples, such as e.g. connector for IBM Spectrum Archive, is not
# stated or implied by the given configuration examples
#
#
# Configure IBM Spectrum Archive/Protect Connector/Backend for SwiftHLM
[ibmsasp]
# IBM Spectrum Archive/Protect Connector configuration
connector_tmp_dir = /srv/node/gpfs/tmp/swifthlm
# IBM Spectrum Archive/Protect Backend configuration
gpfs_filesystem_or_fileset=/srv/node/gpfs
library=lib0
tape_storage_pool=pool0@lib0

Then, I notice the following: Default mount point for swift storage devices is under "/srv/node", and in your post I see: "/srv/4/node/sdb4/objects/531/024/84e90445794d21d6a5590b617c80f024/". So you might need to change a setting in /etc/swift/object-server.conf :

[DEFAULT]                                                                                     
...                                                                              
# swift_dir = /etc/swift                                                                      
# devices = /srv/node 

... to reflect that.

Regards, Slavisa

tommyJin commented 7 years ago

Currently, I just read from one object-server file, so the issue does not appear so far.