leo-project / leofs

The LeoFS Storage System
https://leo-project.net/leofs/
Apache License 2.0
1.54k stars 154 forks source link

leofs-storage {cause,not_found} #1156

Open varuntanwar opened 5 years ago

varuntanwar commented 5 years ago

A few days back we had added a few nodes in the leofs cluster, but we removed them later. But it is still looking for the the removed nodes and throwing the following error logs:

[W] storage_101@prd-leofs101 2018-11-03 11:44:48.578937 +0550 1541225688 leo_storage_mq:rebalance_1/1 912 [{node,'storage_107@prd-leofs107'},{addr_id,38174905918673712019893746042006228611},{key,<<"datastore/report/dbe8d1bf-b5db-4b25-bcfg-146f2762ec7g">>},{cause,not_found}]

why it is still looking for the the removed nodes? I have executed the rebalancing command after removing the nodes. Is there a caching mechanism in the backend which I need to clear/refresh? Restarting the leofs cluster is not preferable.

mocchira commented 5 years ago

and one thing I forgot to answer your question.

Will this effect any read or write on the system?

Still not sure however AFAIK, As long as the ring information is correct (I confirmed this is correct on your system) across the cluster, read/write operations should work without any trouble.

varuntanwar commented 5 years ago

I was trying the recover process but facing issues:

I uploaded a file in the stage cluster and deleted it.

-rw------- 1 root root 61 Dec 28 13:19 ipp2.txt

After running the diagnose command, got following values:

leo_object_storage_7.20181228.13.2:757480977    68502814602494977421222092134414885894  datastorage/ipp2.txt    0   61  1545983350465793    2018-12-28 13:19:10 +0550   0
leo_object_storage_7.20181228.13.2:757481395    68502814602494977421222092134414885894  datastorage/ipp2.txt    0   0   1545983364474221    2018-12-28 13:19:24 +0550   1

What should be the correct offset value? 757481395 or 757480977 I tried with both offset value.

After that I tried number of commands but not able to get the content.

dd if=7.avs bs=4096 skip=757481395 count=60 dd if=7.avs bs=1 skip=757480977 count=60

head -n 757481037 7.avs | tail -60f

In this command I can see the content of the file. But other content also. But weird part is

wc -l 7.avs
2849361 7.avs

I tried cutting using the offset value obtained as per above explanation, there is no output found. Any syntax you have on how to cut using offset. Or this offset is not an unix-file-offset value?

root@stg-leofs403:/storage/data/object# grep --binary-files=binary -u 757481395  -C 60 7.avs
root@stg-leofs403:/storage/data/object# grep --binary-files=binary -b 757481395  -C 60 7.avs
mocchira commented 5 years ago

@varuntanwar Thanks for trying diagnose out.

Regarding offset, Sorry I told you a wrong information what offset actually means so let me explain the proper offset meaning here.

From an offset, there is not only a content itself but also some associated metadata stored around it so that's why you failed to retrieve a content.

The start offset of a content can be calculated in

Offset + 128 + KeySize

So in your case ,

757480977 (Offset) + 128 + 8 (KeySize of "ipp2.txt") = 757481113

757481113 is the actual offset to retrieve the content so

dd if=7.avs bs=1 skip=757481113 count=60

should give you the right content.

Please give it another try and tell us if it works for you.

varuntanwar commented 5 years ago

After running following command I can get the data:

dd if=7.avs bs=1 skip=757481113 count=87

datastorage-stage/ipp2.txtHello
THis is test file

I want to recover.

Please recover.
87+0 records in
87+0 records out
87 bytes copied, 0.000256981 s, 339 kB/s

I have following question:

mocchira commented 5 years ago

What should be the correct offset value? 757481395 or 757480977. Once I deleted the file, offset value is different.

In this case, 757480977 is correct. 757481395 is a pointer for the delete flag generated at deleting the file so there is no content.

That being said, the operation order is important. Let's say

  1. PUT an object at Offset1
  2. DELETE the object at Offset2
  3. PUT an object (the same name with 1) at Offset3

then you can see the same file name three times but what you want is Offset3 here.

How to calculate the key size of a file? Will calculation part be different for those files which are more than 5MB?

Keysize is just a filename size so strlen(filename) is what you want. As for a large file size, it will be prefixed by "\n" and Child Number so in case of ${PATH_TO_FILE}\n1, key size would be strlen(${PATH_TO_FILE}) + 2 (\n1). One more example, in case of ${PATH_TO_FILE}\n11, key size would be strlen(${PATH_TO_FILE}) + 3 (\n11).

How to calculate count value in dd command? I gave 60 because file size is 61 and in the above comment you mentioned offset to ${offset + size - 1}

Simply set the original file size in case using dd. the above comment I mentioned is for some script language (index is starting from zero)

varuntanwar commented 5 years ago

My file size is -rw------- 1 root root 61 Dec 28 13:19 ipp2.txt

But to recover the file, I have to use count value is 87.

dd if=7.avs bs=1 skip=757481113 count=87

datastorage-stage/ipp2.txtHello
THis is test file

I want to recover.

Please recover.
87+0 records in
87+0 records out
87 bytes copied, 0.000256981 s, 339 kB/s

But according to file the count should be 60.

mocchira commented 5 years ago

@varuntanwar Sorry for the long delay. I can spare time to look into this tonight.

varuntanwar commented 5 years ago

Hey,

We are facing issues in accessing the files. I can see that file is present.

root@prd-leofs101:/home/varun.tanwar# leofs-adm whereis datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99
-------+-------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
 del?  |                   node                    |             ring address             |    size    |   checksum   |  has children  |  total chunks  |     clock      |             when
-------+-------------------------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------+----------------------------
       | storage_101@prd-leofs101      | 474b5100c6bfb8d563ba5967b9c51894     |        96K |   ae7d411fa3 | false          |              0 | 57c7da5b2179f  | 2018-12-08 12:56:16 +0550
       | storage_104@prd-leofs104      | 474b5100c6bfb8d563ba5967b9c51894     |        96K |   ae7d411fa3 | false          |              0 | 57c7da5b2179f  | 2018-12-08 12:56:16 +0550

When I tried to get the file via s3cmd:

s3cmd  get s3://datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99
download: 's3://datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99' -> './034dac6b-6c94-466d-9891-2ab1250cef99'  [1 of 1]
ERROR: S3 error: 500 (Internal Server Error)

This is in the gateway logs:

access.20190121.00.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 01:30:37.922649 +0550    1548014437922648200 5   miss
access.20190121.07.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 09:27:23.107439 +0550    1548043043107438200 3   miss
access.20190121.07.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 09:32:10.435729 +0550    1548043330435727200 3   miss
access.20190121.07.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 10:10:41.697120 +0550    1548045641697116200 3   miss
access.20190121.10.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 10:35:30.132896 +0550    1548047130132894200 3   miss
access.20190121.10.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 11:06:31.775687 +0550    1548048991775686200 3   miss
access.20190121.10.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 11:11:40.275160 +0550    1548049300275159200 3   miss
access.20190121.10.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 11:18:20.84713 +0550 1548049700084712    200 3   miss
access.20190121.10.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 11:48:57.135795 +0550    1548051537135788200 4   miss
access.20190121.12.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 12:59:30.162066 +0550    1548055770162065200 5   miss
access.20190121.12.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 13:03:49.478610 +0550    1548056029478607200 5   miss
access.20190121.12.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 13:05:15.157900 +0550    1548056115157899200 3   miss
access.20190121.12.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 13:51:20.877439 +0550    1548058880877438200 6   miss
access.20190121.12.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 14:37:12.327745 +0550    1548061632327743200 4   miss
access.20190121.14.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   98029   2019-01-21 15:34:46.792692 +0550    1548065086792691200 3   miss
access.20190121.16.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   0   2019-01-21 17:29:48.798988 +0550    1548071988798986500 60002   miss
access.20190121.16.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   0   2019-01-21 17:31:25.349148 +0550    1548072085349146500 60003   miss
access.20190121.16.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   0   2019-01-21 17:32:07.750929 +0550    1548072127750928500 60003   miss
access.20190121.16.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   0   2019-01-21 17:35:53.460948 +0550    1548072353460947500 60002   miss
access.20190121.16.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   0   2019-01-21 17:36:39.268978 +0550    1548072399268976500 60002   miss
access.20190121.16.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   0   2019-01-21 17:36:41.790208 +0550    1548072401790206500 60002   miss
access.20190121.16.1:[GET]  datastore   datastore/onboarding/034dac6b-6c94-466d-9891-2ab1250cef99   0   0   2019-01-21 17:37:40.805939 +0550    1548072460805938500 60002   miss