keybase / keybase-issues

A single repo for managing publicly recognized issues with the keybase client, installer, and website.
902 stars 37 forks source link

OSX kbfs input/output errors in team folder #3492

Open earzur opened 4 years ago

earzur commented 4 years ago

Hello,

using one of our team drives recently became unbearably slow, with reports of "input/output errors" on many files.

I could find that in my local kbfs logs:

2019-08-30T12:31:02.059835+02:00 ▶ [WARN kbfs(BSR) bserver_remote.go:421] 1281 Get id=02240c8ee01eaf0bc0b2cb4a937df454fcf4c044ef9cf76af4ddcc88846e12470c tlf=69748b4b9f22c081b9c88eaecc0c1e26 context=Context{Creator: b341fc29dae1b491ef71b9fc0ff9f724} sz=0 err=Server: block does not exist [tags:FID=Jn92wuWf52Ji6yDF9CPNuw]
2019-08-30T12:31:02.059965+02:00 ▶ [DEBU kbfs block_retrieval_queue.go:621] 1282 Couldn't get block BlockPointer{ID: 02240c8ee01eaf0bc0b2cb4a937df454fcf4c044ef9cf76af4ddcc88846e12470c, KeyGen: 11, DataVer: 1, Context: Context{Creator: b341fc29dae1b491ef71b9fc0ff9f724}, DirectType: direct}: Server: block does not exist [tags:FID=Jn92wuWf52Ji6yDF9CPNuw]
2019-08-30T12:31:02.060153+02:00 ▶ [WARN kbfs(FBO 69748b4b) folder_branch_ops.go:3128] 1283 Got unexpected read error on a synced TLF: Server: block does not exist [tags:FID=Jn92wuWf52Ji6yDF9CPNuw]
2019-08-30T12:31:02.060274+02:00 ▶ [DEBU kbfs(FBO 69748b4b) folder_branch_ops.go:3445] 1284 [duration=193.960782ms] Lookup NodeID(0xc436fd2000) tenant-song done: NodeID(nil) Server: block does not exist [tags:FID=Jn92wuWf52Ji6yDF9CPNuw]
2019-08-30T12:31:02.060319+02:00 ▶ [DEBU kbfs(kbfsfuse) dir.go:553] 1285 Server: block does not exist [tags:FID=Jn92wuWf52Ji6yDF9CPNuw]
2019-08-30T12:31:02.187541+02:00 ▶ [WARN kbfs(BSR) bserver_remote.go:421] 1286 Get id=02240c8ee01eaf0bc0b2cb4a937df454fcf4c044ef9cf76af4ddcc88846e12470c tlf=69748b4b9f22c081b9c88eaecc0c1e26 context=Context{Creator: b341fc29dae1b491ef71b9fc0ff9f724} sz=0 err=Server: block does not exist [tags:FID=2XHpZFBLt8SaT-jiUHJ33w]
2019-08-30T12:31:02.187700+02:00 ▶ [DEBU kbfs block_retrieval_queue.go:621] 1287 Couldn't get block BlockPointer{ID: 02240c8ee01eaf0bc0b2cb4a937df454fcf4c044ef9cf76af4ddcc88846e12470c, KeyGen: 11, DataVer: 1, Context: Context{Creator: b341fc29dae1b491ef71b9fc0ff9f724}, DirectType: direct}: Server: block does not exist [tags:FID=2XHpZFBLt8SaT-jiUHJ33w]
2019-08-30T12:31:02.187821+02:00 ▶ [WARN kbfs(FBO 69748b4b) folder_branch_ops.go:3128] 1288 Got unexpected read error on a synced TLF: Server: block does not exist [tags:FID=2XHpZFBLt8SaT-jiUHJ33w]
2019-08-30T12:31:02.187891+02:00 ▶ [DEBU kbfs(FBO 69748b4b) folder_branch_ops.go:3445] 1289 [duration=127.229799ms] Lookup NodeID(0xc436fd2000) tenant-song done: NodeID(nil) Server: block does not exist [tags:FID=2XHpZFBLt8SaT-jiUHJ33w]
2019-08-30T12:31:02.188039+02:00 ▶ [DEBU kbfs(kbfsfuse) dir.go:553] 128a Server: block does not exist [tags:FID=2XHpZFBLt8SaT-jiUHJ33w]

has some block disappeared from server storage ? Is there anyway we can recover those blocks ? (we have backups)

I tried to use keybase log send to provide some more info but:

This command will send recent keybase log entries to keybase.io
for debugging purposes only.

These logs don’t include your private keys, encrypted data or file names,
but they will include metadata Keybase normally can't read
(like file sizes and git repo names), for debugging purposes.

Continue sending logs to keybase.io? (type 'YES' to confirm): YES
▶ INFO ignoring UI logs: GUI main process wasn't found
▶ ERROR API network error: Post https://api-0.core.keybaseapi.com/_/api/1.0/logdump/send.json: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Thanks in advance for any help

strib commented 4 years ago

Hi @earzur I will definitely take a look. It would be great it we could get the full logs though -- often times the keybase log send command will eventually work if you try it a few times in a row. Can you try again? If not, you can try zipping all of your keybase.kbfs.log* files (see keybase status for their location) with me (strib@github) in a private shared folder on KBFS?

strib commented 4 years ago

I was able to restore that particular block for you, but I'm guessing there will be more to do and I'd like to understand more about how it got to this state, so getting the logs would be great, thanks!

earzur commented 4 years ago

Hey Jeremy,

thanks for your help.

I couldn't log send with the same api timeout. As instructed, i have shared my keybase.kbfs.log* with you.

Happy to help finding out what's wrong, because that issue is plagging us

Erwan

On Fri, Aug 30, 2019 at 5:54 PM Jeremy Stribling notifications@github.com wrote:

I was able to restore that particular block for you, but I'm guessing there will be more to do and I'd like to understand more about how it got to this state, so getting the logs would be great, thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keybase/keybase-issues/issues/3492?email_source=notifications&email_token=AAA4MKJPGTDYRAAZUNPGCNLQHE7FBA5CNFSM4ISL6JKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5SBVWA#issuecomment-526654168, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA4MKJ7EFJWB4SOV5CEIZLQHE7FBANCNFSM4ISL6JKA .

strib commented 4 years ago

Thanks for the logs.

It seems like there was at least one issue with some device updating the folder that caused the folder to reference data that was already deleted. I think the first such issue happened around August 9th. I will directly message you on Keybase with the user and device name that I think was related to the issue -- if we can get logs from that device, we might get some more clues about why this happened.

In the logs you sent, I see at least 4 paths in your team folder that reference missing data. I was able to restore 2 of these places, but 2 others were deleted so long ago that they are unrecoverable. For the ones I fixed, if they were directories or if they were large files, there might be more paths within them that need fixing -- since the system is end-to-end encrypted it's hard for me to tell what else will be needed.

And the logs themselves only contain obfuscated file and directory names, to prevent leaking information to Keybase. But I can give you these obfuscated names, and you can pass them to keybase fs debug deobfuscate to see which paths were affected.

These are the ones I was able to restore, substitute your team name where indicated (I didn't want to publicize it on github):

/keybase/team/<TEAM_NAME>/polar-cage/cliff-people/clay-repeat
/keybase/team/<TEAM_NAME>/polar-cage/cliff-people/canoe-word

These are the ones I wasn't able to restore:

/keybase/team/<TEAM_NAME>/polar-cage/cliff-people/double-jelly/theme-upper/mosquito-size
/keybase/team/<TEAM_NAME>/polar-cage/cliff-people/flip-quote/more-panda/more-whale

As you can see, there is one parent directory where all the trouble happened. You must have hit some bug where a user's device messed up and put a previously-deleted version of that directory back into place at some point. I'll privately message you with more details -- perhaps if we can track down the logs showing the problem, we can find out more about what happened and restore the data more fully.

earzur commented 4 years ago

I see. Thanks for tracking it down...

One thing. This file hierarchy is very important to us (contains production secrets), so we are also tracking it with git and a keybase remote.

Maybe there is something messing up with it ?

Would inviting you to the team as a reader help digging for more info ?

Erwan

On Tue, Sep 3, 2019 at 7:03 PM Jeremy Stribling notifications@github.com wrote:

Thanks for the logs.

It seems like there was at least one issue with some device updating the folder that caused the folder to reference data that was already deleted. I think the first such issue happened around August 9th. I will directly message you on Keybase with the user and device name that I think was related to the issue -- if we can get logs from that device, we might get some more clues about why this happened.

In the logs you sent, I see at least 4 paths in your team folder that reference missing data. I was able to restore 2 of these places, but 2 others were deleted so long ago that they are unrecoverable. For the ones I fixed, if they were directories or if they were large files, there might be more paths within them that need fixing -- since the system is end-to-end encrypted it's hard for me to tell what else will be needed.

And the logs themselves only contain obfuscated file and directory names, to prevent leaking information to Keybase. But I can give you these obfuscated names, and you can pass them to keybase fs debug deobfuscate to see which paths were affected.

These are the ones I was able to restore, substitute your team name where indicated (I didn't want to publicize it on github):

/keybase/team//polar-cage/cliff-people/clay-repeat /keybase/team//polar-cage/cliff-people/canoe-word

These are the ones I wasn't able to restore:

/keybase/team//polar-cage/cliff-people/double-jelly/theme-upper/mosquito-size /keybase/team//polar-cage/cliff-people/flip-quote/more-panda/more-whale

As you can see, there is one parent directory where all the trouble happened. You must have hit some bug where a user's device messed up and put a previously-deleted version of that directory back into place at some point. I'll privately message you with more details -- perhaps if we can track down the logs showing the problem, we can find out more about what happened and restore the data more fully.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keybase/keybase-issues/issues/3492?email_source=notifications&email_token=AAA4MKJLSDKL7T4PRWSSWVDQH2KEZA5CNFSM4ISL6JKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5Y4CTA#issuecomment-527548748, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA4MKNS5AEJQ4B26DD4ICDQH2KEZANCNFSM4ISL6JKA .

akatrevorjay commented 4 years ago

Hi, I would like to report that I hit this error as well, only recently however. It's causing kbfs to be rather unusable as every time I try to hit these missing blocks the kbfs daemon crashes, reverting any changes I've made in the interim. Should I open up a new issue even though I believe it's the same issue?

Thanks, Trevor

strib commented 4 years ago

@akatrevorjay if there's really a crash, that's definitely a separate issue. Please send logs and open a new issue and I can look when I'm back at work.