ibm-research / swifthlm

SwiftHLM - a middleware for using OpenStack Swift with tape and other hight latency media storage backends
Apache License 2.0
14 stars 4 forks source link

New requests are not accepted and the old remains undeleted #4

Closed tommyJin closed 7 years ago

tommyJin commented 7 years ago

Hi there,

Basic info: python: 2.7 swift: 2.5.0(Liberty) swifthlm: 0.2.2(before 2017/06/20 submit)

The username I use is 'osddev', and I only modify some swift config files as the hlm doc tells. Policies and other settings remain the same.

I change the key-value pair 'User=swift' to 'User=osddev' in file swifthlm.dispatcher.service.

I encountered a issue that the old record always remain. I did some upload tests before and after installing HLM, so there are many folders under /srv/(1,2,3,4)/node/sdb(1,2,3,4)/objects. In order to find out where and how do HLM objects(in SWIFTHLM_PENDING_REQUESTS_CONTAINER) exist, I took some steps like below:

  1. Run command

rm -rf /srv/1/node/sdb1/objects/* rm -rf /srv/2/node/sdb2/objects/* rm -rf /srv/3/node/sdb3/objects/* rm -rf /srv/4/node/sdb4/objects/*

to delete all subfolders under objects directory.

  1. Run commands in order:

    swift-init all stop systemctl stop swifthlm.dispatcher swift-init all start systemctl start swifthlm.dispatcher systemctl status swifthlm.dispatcher -l curl -v -X GET -H 'X-Storage-User: test:tester' -H 'X-Storage-Pass: testing' http://127.0.0.1:8080/auth/v1.0

and I got the token and uploaded a object 'testfile3', and run another command

curl -v -X POST -H 'X-Storage-Token: token' http://127.0.0.1:8080/hlm/v1/migrate/AUTH_test/testcontainer1/testfile3

and the log is shown below

Jul 3 15:55:28 localhost hlm-dispatcher: 752 [middleware.py: init():244] info: Initialized SwiftHLM Middleware Jul 3 15:55:28 localhost hlm-dispatcher: 752 [middleware.py: init():245] dbg: Initialized SwiftHLM Middleware Jul 3 15:55:29 localhost hlm-dispatcher: 015 [middleware.py: init():244] info: Initialized SwiftHLM Middleware Jul 3 15:55:29 localhost hlm-dispatcher: 015 [middleware.py: init():245] dbg: Initialized SwiftHLM Middleware Jul 3 15:55:29 localhost hlm-dispatcher: 020 [middleware.py: call():273] env: {'wsgi.multithread': False, 'SCRIPT_NAME': '', 'wsgi.input': <StringIO.StringIO instance at 0x20cce18>, 'REQUEST_METHOD': 'HEAD', 'HTTP_HOST': '127.0.0.1:8080', 'PATH_INFO': '/v1/v1.0', 'SERVER_PROTOCOL': 'HTTP/1.0', 'QUERY_STRING': '', 'swift.authorize': <function at 0x1fd9938>, 'swift.source': 'RL', 'HTTP_USER_AGENT': 'Swift', 'wsgi.version': (1, 0), 'SERVER_NAME': '127.0.0.1', 'wsgi.errors': <StringIO.StringIO instance at 0x20cc830>, 'wsgi.multiprocess': False, 'swift.trans_id': 'tx13d15b5b1103495eb9bdd-005959f871', 'wsgi.url_scheme': 'http', 'REMOTE_USER': '.wsgi.pre_authed', 'SERVER_PORT': '8080', 'swift.cache': <swift.common.memcached.MemcacheRing object at 0x204c890>, 'swift.authorize_override': True} Jul 3 15:55:29 localhost hlm-dispatcher: 020 [middleware.py: call():274] env[PATH_INFO]: /v1/v1.0 (txn: tx13d15b5b1103495eb9bdd-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 020 [middleware.py: call():289] namespace:v1, ver_ifhlm:v1.0, cmd_ifhlm:None, acc_ifhlm:None, con_ifhlm:None, obj_ifhlm:None: (txn: tx13d15b5b1103495eb9bdd-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 020 [middleware.py: call():305] version:v1, account:v1.0, container:None, obj:None: (txn: tx13d15b5b1103495eb9bdd-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 020 [middleware.py: call():308] version:v1, hlm_req:None, account:v1.0, container:None, obj:None: Jul 3 15:55:29 localhost hlm-dispatcher: 020 [middleware.py: call():309] req.method=HEAD Jul 3 15:55:29 localhost hlm-dispatcher: 020 [middleware.py: call():312] req.headers: {'Host': '127.0.0.1:8080', 'User-Agent': 'Swift'} (txn: tx13d15b5b1103495eb9bdd-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 310 [middleware.py: init():244] info: Initialized SwiftHLM Middleware Jul 3 15:55:29 localhost hlm-dispatcher: 310 [middleware.py: init():245] dbg: Initialized SwiftHLM Middleware Jul 3 15:55:29 localhost hlm-dispatcher: 310 [dispatcher.py: init():101] info: Initialized Dispatcher Jul 3 15:55:29 localhost hlm-dispatcher: 310 [dispatcher.py: init():102] dbg: Initialized Dispatcher Jul 3 15:55:29 localhost hlm-dispatcher: 310 [dispatcher.py: run():166] Polling the requests queue from elseif Jul 3 15:55:29 localhost hlm-dispatcher: 310 [middleware.py:pull_a_mig_or_rec_request_from_queue():745] Enter pull a mig or rec request from queue Jul 3 15:55:29 localhost hlm-dispatcher: 350 [middleware.py:pull_a_mig_or_rec_request_from_queue():763] Pulled a request from queue: 20170629024143.265--migrate--AUTH_test--sofContainer--0 (txn: txf7923aa51d2b445199830-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 351 [dispatcher.py:process_next_request():117] Processing request 20170629024143.265--migrate--AUTH_test--sofContainer--0 Jul 3 15:55:29 localhost hlm-dispatcher: 351 [dispatcher.py:process_next_request():123] ts:20170629024143.265, hlmreq:migrate, ac:AUTH_test, c:sofContainer, spi:0, o:None (txn: txf7923aa51d2b445199830-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 351 [middleware.py:create_per_storage_node_objects_list_and_request():487] Container request (txn: txf7923aa51d2b445199830-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 420 [middleware.py:create_per_storage_node_objects_list_and_request():492] objects(first 1024 bytes): [u'mytestfile.txt', u'soffile', u'testfile'] (txn: txd1579a24fd384a0f8bb2b-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 420 [middleware.py:get_obj_storage_nodes():264] Storage nodes: [{'index': 0, 'replication_port': 6040, 'weight': 1.0, 'zone': 4, 'ip': '127.0.0.1', 'region': 1, 'id': 2, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb4', 'port': 6040}, {'index': 1, 'replication_port': 6010, 'weight': 1.0, 'zone': 1, 'ip': '127.0.0.1', 'region': 1, 'id': 0, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb1', 'port': 6010}, {'index': 2, 'replication_port': 6030, 'weight': 1.0, 'zone': 3, 'ip': '127.0.0.1', 'region': 1, 'id': 3, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb3', 'port': 6030}] (txn: txd1579a24fd384a0f8bb2b-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 420 [middleware.py:get_obj_storage_nodes():264] Storage nodes: [{'index': 0, 'replication_port': 6040, 'weight': 1.0, 'zone': 4, 'ip': '127.0.0.1', 'region': 1, 'id': 2, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb4', 'port': 6040}, {'index': 1, 'replication_port': 6010, 'weight': 1.0, 'zone': 1, 'ip': '127.0.0.1', 'region': 1, 'id': 0, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb1', 'port': 6010}, {'index': 2, 'replication_port': 6020, 'weight': 1.0, 'zone': 2, 'ip': '127.0.0.1', 'region': 1, 'id': 1, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb2', 'port': 6020}] (txn: txd1579a24fd384a0f8bb2b-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 420 [middleware.py:get_obj_storage_nodes():264] Storage nodes: [{'index': 0, 'replication_port': 6010, 'weight': 1.0, 'zone': 1, 'ip': '127.0.0.1', 'region': 1, 'id': 0, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb1', 'port': 6010}, {'index': 1, 'replication_port': 6020, 'weight': 1.0, 'zone': 2, 'ip': '127.0.0.1', 'region': 1, 'id': 1, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb2', 'port': 6020}, {'index': 2, 'replication_port': 6030, 'weight': 1.0, 'zone': 3, 'ip': '127.0.0.1', 'region': 1, 'id': 3, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb3', 'port': 6030}] (txn: txd1579a24fd384a0f8bb2b-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 420 [middleware.py:distribute_request_to_storage_nodes_get_responses():606] After self.create_per_storage_node_objects_list_and_request() (txn: txd1579a24fd384a0f8bb2b-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 421 [middleware.py:submit_request_to_storage_node_and_get_response():524] Dispatching request to 127.0.0.1 Jul 3 15:55:29 localhost hlm-dispatcher: 608 [middleware.py:delete_request_from_queue():812] Delete 20170629024143.265--migrate--AUTH_test--sofContainer--0 from pending-hlm-requests (txn: tx3a187c42189547fc82698-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 728 [dispatcher.py:process_next_request():148] Deleted request from queue: 20170629024143.265--migrate--AUTH_test--sofContainer--0 (txn: tx96912db685ec4081892f6-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 728 [dispatcher.py: run():166] Polling the requests queue from elseif (txn: tx96912db685ec4081892f6-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 728 [middleware.py:pull_a_mig_or_rec_request_from_queue():745] Enter pull a mig or rec request from queue Jul 3 15:55:29 localhost hlm-dispatcher: 739 [middleware.py:pull_a_mig_or_rec_request_from_queue():763] Pulled a request from queue: 20170629060705.303--migrate--AUTH_test--sofContainer--0 (txn: tx849f28c7157442f19cce9-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 739 [dispatcher.py:process_next_request():117] Processing request 20170629060705.303--migrate--AUTH_test--sofContainer--0 Jul 3 15:55:29 localhost hlm-dispatcher: 739 [dispatcher.py:process_next_request():123] ts:20170629060705.303, hlmreq:migrate, ac:AUTH_test, c:sofContainer, spi:0, o:None (txn: tx849f28c7157442f19cce9-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 739 [middleware.py:create_per_storage_node_objects_list_and_request():487] Container request (txn: tx849f28c7157442f19cce9-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 751 [middleware.py:create_per_storage_node_objects_list_and_request():492] objects(first 1024 bytes): [u'mytestfile.txt', u'soffile', u'testfile'] (txn: tx88644101acac4b9785a85-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 751 [middleware.py:get_obj_storage_nodes():264] Storage nodes: [{'index': 0, 'replication_port': 6040, 'weight': 1.0, 'zone': 4, 'ip': '127.0.0.1', 'region': 1, 'id': 2, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb4', 'port': 6040}, {'index': 1, 'replication_port': 6010, 'weight': 1.0, 'zone': 1, 'ip': '127.0.0.1', 'region': 1, 'id': 0, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb1', 'port': 6010}, {'index': 2, 'replication_port': 6030, 'weight': 1.0, 'zone': 3, 'ip': '127.0.0.1', 'region': 1, 'id': 3, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb3', 'port': 6030}] (txn: tx88644101acac4b9785a85-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 751 [middleware.py:get_obj_storage_nodes():264] Storage nodes: [{'index': 0, 'replication_port': 6040, 'weight': 1.0, 'zone': 4, 'ip': '127.0.0.1', 'region': 1, 'id': 2, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb4', 'port': 6040}, {'index': 1, 'replication_port': 6010, 'weight': 1.0, 'zone': 1, 'ip': '127.0.0.1', 'region': 1, 'id': 0, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb1', 'port': 6010}, {'index': 2, 'replication_port': 6020, 'weight': 1.0, 'zone': 2, 'ip': '127.0.0.1', 'region': 1, 'id': 1, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb2', 'port': 6020}] (txn: tx88644101acac4b9785a85-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 751 [middleware.py:get_obj_storage_nodes():264] Storage nodes: [{'index': 0, 'replication_port': 6010, 'weight': 1.0, 'zone': 1, 'ip': '127.0.0.1', 'region': 1, 'id': 0, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb1', 'port': 6010}, {'index': 1, 'replication_port': 6020, 'weight': 1.0, 'zone': 2, 'ip': '127.0.0.1', 'region': 1, 'id': 1, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb2', 'port': 6020}, {'index': 2, 'replication_port': 6030, 'weight': 1.0, 'zone': 3, 'ip': '127.0.0.1', 'region': 1, 'id': 3, 'replication_ip': '127.0.0.1', 'meta': u'', 'device': 'sdb3', 'port': 6030}] (txn: tx88644101acac4b9785a85-005959f871) Jul 3 15:55:29 localhost hlm-dispatcher: 751 [middleware.py:distribute_request_to_storage_nodes_get_responses():606] After self.create_per_storage_node_objects_list_and_request() (txn: tx88644101acac4b9785a85-005959f871)

I add some logger info myself in the source code, so maybe the line number may not be the same as the source code shows.

As we can see from the log above:

Jul 3 15:55:29 localhost hlm-dispatcher: 739 [middleware.py:pull_a_mig_or_rec_request_from_queue():763] Pulled a request from queue: 20170629060705.303--migrate--AUTH_test--sofContainer--0 (txn: tx849f28c7157442f19cce9-005959f871) The HLM tried to do a migrate action about migrating a container '20170629060705.303--migrate--AUTH_test--sofContainer--0'. I did input such command before, but the objects are deleted. The newer request about migrate object 'testfile3' did not show up in the log actually.

My questions are:

  1. What should I do to make it run normally? Accept new requests and ignore and delete the old requests.
  2. middleware.py(line 557), ssh_client.connect(ip_addr, username="swift"), what if my username is not 'swift'? For example, my username is 'osddev'.
  3. Did HLM really create objects physically and store them under /srv folder? Cause I delete all folders and files under folder 'objects' and upload a new object and migrate it, but no new files are created.

BTW, sometimes logs cannot be updated by program itself, which means I restart swift and swifthlm service, but new startup log and processing log are not written to the hlm.log.

Best regards, Tommy

tommyJin commented 7 years ago

Hi @hseipp ,

Do you have any idea why this would happen?

Best regards, Tommy

slavisasarafijanovic commented 7 years ago

Hi Tommy,

Harald is on vacation, let me answer your questions:

  1. Can you please check if the new requests get rejected due to wrong token? In your command you used token' instead of $token. Also to better understand which logs are before/after your new migrate command, could you also run 'requests' query (lists submitted requests) immediately after your migrate request, and show the output: curl -v -X POST -H 'X-Storage-Token: $token' http://127.0.0.1:8080/hlm/v1/migrate/AUTH_test/testcontainer1/testfile3 curl -v -X POST -H 'X-Storage-Token: $token' http://127.0.0.1:8080/hlm/v1/requests/AUTH_test/testcontainer1/testfile3

  2. That would require change in the middleware code. Removing ", username="swift" would result in using the current process user: middleware.py(line 557), ssh_client.connect(ip_addr)

If you want to try that then also make sure key based ssh is setup for the user (as in steps 4.2.1 and 4.2.2 or README.txt), and that the same user is configured in /etc/systemd/system/swifthlm.dispatcher.service.

  1. For this one also check token vs $token. And check that swift processes are running. What command did you use to upload object(s) after you deleted the content under 'objects' folder, and what was the return code?

Regards, Slavisa

tommyJin commented 7 years ago

Hi @slavisasarafijanovic ,

Thanks for your reply :)

Sorry for the abbreviation of $token, I did input the newest token for further requests. In addition, I find that if I don't modify line 227 from self.logger = get_logger(conf, name='hlm-middleware', to self.logger = get_logger(conf, name='hlm-dispatcher',, logs from middleware will not go into the /var/log/swift/hlm.log file.

I remove the ", username='swift'" and reinstall the swifthlm, and modify the 'User=swift' to 'User=osddev' in /etc/systemd/system/swifthlm.dispatcher.service, and run ssh-keygen to overwrite the old one and run cp /home/osddev/.ssh/id_rsa.pub /home/osddev/.ssh/authorized_keys and now the files under /home/osddev/.ssh are

[osddev@localhost .ssh]$ ls authorized_keys id_rsa id_rsa.pub known_hosts

Currently, I use dummy connector instead of TLC simulator due to lack of dependencies.

delete all objects and directories

[root@localhost objects]# rm -rf /srv/1/node/sdb1/objects/; rm -rf /srv/2/node/sdb2/objects/; rm -rf /srv/3/node/sdb3/objects/; rm -rf /srv/4/node/sdb4/objects/; [root@localhost objects]# cd /srv/1/node/sdb1/objects; ls [root@localhost objects]# cd /srv/2/node/sdb2/objects; ls [root@localhost objects]# cd /srv/3/node/sdb3/objects; ls [root@localhost objects]# cd /srv/4/node/sdb4/objects; ls

upload file

[root@localhost testresource]# curl -v -X PUT -H 'X-Auth-Token: AUTH_tk8381f9f1c5964bc79423e8f87f188ca8' http://127.0.0.1:8080/v1/AUTH_test/testcontainer1/testfile3 -T /home/osddev/testfile.txt

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    PUT /v1/AUTH_test/testcontainer1/testfile3 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk8381f9f1c5964bc79423e8f87f188ca8 Content-Length: 12 Expect: 100-continue

    < HTTP/1.1 100 Continue

  • We are completely uploaded and fine < HTTP/1.1 201 Created < Last-Modified: Wed, 05 Jul 2017 08:23:06 GMT < Content-Length: 0 < Etag: 6f5902ac237024bdd0c176cb93063dc4 < Content-Type: text/html; charset=UTF-8 < X-Trans-Id: tx961883f1f7244433859f0-00595ca1e9 < Date: Wed, 05 Jul 2017 08:23:05 GMT <
  • Connection #0 to host 127.0.0.1 left intact

list all files under objects before executing 'migrate'

[root@localhost objects]# cd /srv/4/node/sdb4/objects; ls [root@localhost objects]# cd /srv/3/node/sdb3/objects; ls 46 [root@localhost objects]# cd /srv/2/node/sdb2/objects; ls [root@localhost objects]# cd /srv/1/node/sdb1/objects; ls 46

Run 'requests' before 'migrate'

[root@localhost objects]# curl -v -X GET -H 'X-Auth-Token: AUTH_tk8381f9f1c5964bc79423e8f87f188ca8' http://127.0.0.1:8080/hlm/v1/requests/AUTH_test/testcontainer1/testfile3

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /hlm/v1/requests/AUTH_test/testcontainer1/testfile3 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk8381f9f1c5964bc79423e8f87f188ca8

    < HTTP/1.1 200 OK < Content-Length: 880 < Content-Type: text/plain < X-Trans-Id: tx46c6aa81bbe142249efc7-00595ca24b < Date: Wed, 05 Jul 2017 08:24:44 GMT <

  • Connection #0 to host 127.0.0.1 left intact ["20170703064113.153--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703065557.679--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703065717.215--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703075041.909--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703075107.500--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703075124.025--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703100804.066--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705035315.145--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705053853.445--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705054001.232--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705054259.735--migrate--AUTH_test--testcontainer1--0--testfile3--failed"][root@localhost objects]#

Run 'migrate' and 'requests'

[root@localhost objects]# curl -v -X POST -H 'X-Auth-Token: AUTH_tk8381f9f1c5964bc79423e8f87f188ca8' http://127.0.0.1:8080/hlm/v1/migrate/AUTH_test/testcontainer1/testfile3

  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    POST /hlm/v1/migrate/AUTH_test/testcontainer1/testfile3 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk8381f9f1c5964bc79423e8f87f188ca8

    < HTTP/1.1 200 OK < Content-Length: 26 < Content-Type: text/plain < X-Trans-Id: tx7fa74f58c1d5434891860-00595ca2a3 < Date: Wed, 05 Jul 2017 08:26:11 GMT < Accepted migrate request.

  • Connection #0 to host 127.0.0.1 left intact [root@localhost objects]# curl -v -X GET -H 'X-Auth-Token: AUTH_tk8381f9f1c5964bc79423e8f87f188ca8' http://127.0.0.1:8080/hlm/v1/requests/AUTH_test/testcontainer1/testfile3
  • About to connect() to 127.0.0.1 port 8080 (#0)
  • Trying 127.0.0.1...
  • Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)

    GET /hlm/v1/requests/AUTH_test/testcontainer1/testfile3 HTTP/1.1 User-Agent: curl/7.29.0 Host: 127.0.0.1:8080 Accept: / X-Auth-Token: AUTH_tk8381f9f1c5964bc79423e8f87f188ca8

    < HTTP/1.1 200 OK < Content-Length: 1041 < Content-Type: text/plain < X-Trans-Id: tx8d51451755a34fbd98ecb-00595ca2a8 < Date: Wed, 05 Jul 2017 08:26:16 GMT < ["20170705082611.656--migrate--AUTH_test--testcontainer1--0--testfile3--pending", "20170703064113.153--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703065557.679--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703065717.215--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703075041.909--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703075107.500--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703075124.025--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170703100804.066--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705035315.145--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705053853.445--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705054001.232--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705054259.735--migrate--AUTH_test--testcontainer1--0--testfile3--failed", "20170705082602.466--migrate--AUTH_test--testcontainer1--0--te* Connection #0 to host 127.0.0.1 left intact stfile3--failed"][root@localhost objects]#

from the output we could see that some older requests have not been handled like "20170703064113.153--migrate--AUTH_test--testcontainer1--0--testfile3--failed".

list all files under /objects after executing migrate

[root@localhost objects]# cd /srv/4/node/sdb4/objects; ls [root@localhost objects]# cd /srv/3/node/sdb3/objects; ls 142 39 46 53 776 [root@localhost objects]# cd /srv/2/node/sdb2/objects; ls 142 39 53 776 [root@localhost objects]# cd /srv/1/node/sdb1/objects; ls 142 39 46 [root@localhost objects]#

And I check out the .db file under /srv/1/node/sdb1/containers/663/76d/a5edc6b2f1d47145c012dee12f60876d/ named a5edc6b2f1d47145c012dee12f60876d.db and view the tables and I find records in table 'object' below:

ROWID name created_at size content_type etag deleted storage_policy_index 23 20170613061632.069--migrate--AUTH_test--container1--0 1498733644.78429 0 application/deleted noetag 1 0 24 20170613061745.698--recall--AUTH_test--container1--0 1498806380.70219 0 application/deleted noetag 1 0 25 20170613170807.015--migrate--AUTH_test--container1--0 1498809922.97080 0 application/deleted noetag 1 0 26 20170614033858.964--migrate--AUTH_test--sofContainer--0 1498810769.39533 0 application/deleted noetag 1 0 28 20170628020359.579--migrate--AUTH_test--sofContainer--0 1498812757.51289 0 application/deleted noetag 1 0 29 20170630082827.167--migrate--AUTH_test--testcontainer1--0--testfile 1498813270.77409 0 application/deleted noetag 1 0 31 20170628034426.764--migrate--AUTH_test--container1--0 1499049591.41261 0 application/deleted noetag 1 0 33 20170628034606.025--migrate--AUTH_test--container1--0 1499050306.93780 0 application/deleted noetag 1 0 35 20170628034616.500--migrate--AUTH_test--container1--0 1499062413.74734 0 application/deleted noetag 1 0 37 20170628053149.246--migrate--AUTH_test--container1--0 1499062653.96940 0 application/deleted noetag 1 0 39 20170628113029.246--migrate--AUTH_test--sofContainer--0 1499062789.80359 0 application/deleted noetag 1 0 41 20170629013549.762--migrate--AUTH_test--sofContainer--0 1499063355.95216 0 application/deleted noetag 1 0 43 20170629013804.754--migrate--AUTH_test--sofContainer--0 1499063462.72824 0 application/deleted noetag 1 0 44 20170629014706.860--migrate--AUTH_test--sofContainer--0 1499063574.93438 0 application/deleted noetag 1 0 46 20170629023514.918--migrate--AUTH_test--sofContainer--0 1499064930.09461 0 application/deleted noetag 1 0 52 20170629024143.265--migrate--AUTH_test--sofContainer--0 1499068529.61031 0 application/deleted noetag 1 0 53 20170629060705.303--migrate--AUTH_test--sofContainer--0 1499071963.70294 0 application/deleted noetag 1 0 55 20170629080249.234--migrate--AUTH_test--sofContainer--0--testfile 1499226301.85943 0 application/deleted noetag 1 0 62 20170629080601.608--migrate--AUTH_test--sofContainer--0--testfile 1499239244.77141 0 application/deleted noetag 1 0 63 20170629083012.143--migrate--AUTH_test--sofContainer--0--testfile 1499240531.14688 0 application/deleted noetag 1 0 64 20170629091501.338--migrate--AUTH_test--testcontainer--0--testfile 1499241468.00168 0 application/deleted noetag 1 0 65 20170629095519.623--migrate--AUTH_test--testcontainer--0--testfile1 1499241710.74410 0 application/deleted noetag 1 0 66 20170629095528.003--migrate--AUTH_test--testcontainer--0--testfile1 1499241713.85864 0 application/deleted noetag 1 0 67 20170630091430.232--migrate--AUTH_test--testcontainer1--0--testfile1 1499241714.80492 0 application/deleted noetag 1 0 68 20170703024511.252--migrate--AUTH_test--testcontainer1--0--testfile2 1499241716.13648 0 application/deleted noetag 1 0 69 20170703025226.838--migrate--AUTH_test--testcontainer1--0--testfile2 1499241717.15535 0 application/deleted noetag 1 0 70 20170703061550.748--migrate--AUTH_test--testcontainer1--0--testfile2 1499241717.94710 0 application/deleted noetag 1 0 71 20170703061741.493--migrate--AUTH_test--testcontainer1--0--testfile2 1499241718.67645 0 application/deleted noetag 1 0 72 20170703062020.472--migrate--AUTH_test--testcontainer1--0--testfile2 1499241719.46636 0 application/deleted noetag 1 0 73 20170703062921.548--migrate--AUTH_test--testcontainer1--0--testfile2 1499241720.31480 0 application/deleted noetag 1 0 74 20170703065557.679--migrate--AUTH_test--testcontainer1--0--testfile3 1499241721.99651 0 application/deleted noetag 1 0 75 20170703064113.153--migrate--AUTH_test--testcontainer1--0--testfile3 1499241721.26659 0 application/deleted noetag 1 0 76 20170703065717.215--migrate--AUTH_test--testcontainer1--0--testfile3 1499241722.85521 0 application/deleted noetag 1 0 77 20170703075107.500--migrate--AUTH_test--testcontainer1--0--testfile3 1499241724.28387 0 application/deleted noetag 1 0 78 20170703075041.909--migrate--AUTH_test--testcontainer1--0--testfile3 1499241723.56999 0 application/deleted noetag 1 0 79 20170703100804.066--migrate--AUTH_test--testcontainer1--0--testfile3 1499241725.69768 0 application/deleted noetag 1 0 80 20170703075124.025--migrate--AUTH_test--testcontainer1--0--testfile3 1499241724.98917 0 application/deleted noetag 1 0 81 20170705035315.145--migrate--AUTH_test--testcontainer1--0--testfile3 1499241726.44655 0 application/deleted noetag 1 0 82 20170705053853.445--migrate--AUTH_test--testcontainer1--0--testfile3 1499241727.41514 0 application/deleted noetag 1 0 83 20170705054001.232--migrate--AUTH_test--testcontainer1--0--testfile3 1499241728.23511 0 application/deleted noetag 1 0 84 20170705055130.887--migrate--AUTH_test--testcontainer1--0--test1 1499241729.71925 0 application/deleted noetag 1 0 85 20170705054259.735--migrate--AUTH_test--testcontainer1--0--testfile3 1499241729.00254 0 application/deleted noetag 1 0 86 20170705055553.306--migrate--AUTH_test--testcontainer1--0--test1 1499241730.44220 0 application/deleted noetag 1 0 89 20170705082602.466--migrate--AUTH_test--testcontainer1--0--testfile3 1499243170.09840 0 application/deleted noetag 1 0 90 20170705082611.656--migrate--AUTH_test--testcontainer1--0--testfile3 1499243200.87371 0 application/deleted noetag 1 0

Is this output normal? The old requests seem to be stuck, but now the new requests are able to be added to the request queue now(and I don't know how, cause a few days ago 'migrate' request would only return 'accepted' but not shown in the 'requests' request)

Thanks a lot for viewing and replying my questions.

Best regards, Tommy

slavisasarafijanovic commented 7 years ago

Hi Tommy,

The output you show seems normal. But to better understand what is going on on your test system more output seems necessary. You show the output of the "requests" HLM query only 5 sec after the migration is submitted, it might take longer to the dispatcher to pick the "migrate" HLM request from the queue, process it, and remove it from the queue. So "pending" entry at this time is expected, but it would be good to see what is the output of a "requests" HLM query at a later time. The content of the .db file suggests the new request was eventually processed and deleted (removed) from the queue (need to verify that after recreating your test scenario), and only after that the previously failed requests for the same container/object should also be removed from the queue.

As for the logs, I was getting logs from middleware into /var/log/swift/hlm.log, without the code change you did. I will try to recreate you environment (from the previous issue posts) and check if I can recreate the log behavior you observed.

Regards, Slavisa

tommyJin commented 7 years ago

The issue about title is done. HLM needs time to process the requests. And related settings must be set appropriately.

Log issue remains. The 'hlm.log' only stores logs from Dispatcher.py

Thanks everybody's help.