seedvault-app / seedvault

A backup application for the Android Open Source Project.
1.19k stars 73 forks source link

Fix binder exception when restoring a large number of applications #563

Closed stevesoltys closed 7 months ago

stevesoltys commented 9 months ago

Copy of #484 with review comment changes.

grote commented 9 months ago

Awesome work! This looks a lot better than the previous attempt at fixing this. There's still some ktlint style complaints to fix. I'll also run this through manual tests before merging (as this is a rather critical code path).

grote commented 7 months ago

I could reproduce the issue that this MR is fixing by installing more than 500 apps:

2023-11-14 11:17:54.932  1522-2762  BackupManagerService    pid-1522                             I  Full restore; asking about 515 apps
2023-11-14 11:17:54.942  1522-2762  BackupManagerService    pid-1522                             D  Starting restore.
2023-11-14 11:17:54.943  1522-2762  BackupRestoreController pid-1522                             I  System restore starting for user: 0
2023-11-14 11:17:54.948 10154-10154 Configurab...ortService com.stevesoltys.seedvault            D  Service created.
2023-11-14 11:17:54.953 10154-10154 Configurab...ortService com.stevesoltys.seedvault            D  Transport bound.
2023-11-14 11:17:54.956  1522-1522  TransportConnection     pid-1522                             I  ConfigurableBackupTransportService#1453: Notifying [PerformUnifiedRestoreTask.startRestore()] transport = BackupTransportClient
2023-11-14 11:17:54.969  1522-2762  BpBinder                pid-1522                             W  Large outgoing transaction of 826056 bytes, interface descriptor , code 14
2023-11-14 11:17:54.969  1522-2762  JavaBinder              pid-1522                             E  !!! FAILED BINDER TRANSACTION !!!  (parcel size = 826056)
2023-11-14 11:17:54.970  1522-2762  BackupManagerService    pid-1522                             E  Unable to contact transport for restore: data parcel size 826056 bytes
2023-11-14 11:17:15.769  2762-2762  binder                  pid-2762                             I  1522:2762 transaction failed 29201/-28, size 826056-8 line 3360
2023-11-14 11:17:54.971 10154-10211 RestoreCoordinator      com.stevesoltys.seedvault            D  finishRestore
2023-11-14 11:17:54.972  1522-2762  BackupRestoreController pid-1522                             I  systemRestoreFinished for 0
2023-11-14 11:17:54.972  1522-2762  BackupRestoreController pid-1522                             I  maybeSendWidgetRestoreBroadcasts for 0
2023-11-14 11:17:54.972  1522-2762  BackupManagerService    pid-1522                             I  Restore complete.
2023-11-14 11:17:54.974  1522-2762  BackupManagerService    pid-1522                             V  [UserID:0] Released wakelock:*backup*-0-2762
2023-11-14 11:17:54.975 10154-10154 Configurab...ortService com.stevesoltys.seedvault            D  Service destroyed.
2023-11-14 11:17:55.054  1522-18904 RestoreSession          pid-1522                             D  endRestoreSession
2023-11-14 11:17:55.054  1522-2762  BackupManagerService    pid-1522                             V  [UserID:0] Clearing restore session and halting timeout
grote commented 7 months ago

Something isn't working right here. I got some error in the beginning which showed an error message in the UI and unlocked the FINISH button, however probably due to the new code, the restore continued and was actually doing stuff.

Then this pattern keeps repeating over and over:

2023-11-14 12:37:33.459 14334-14349 RestoreCoordinator      com.stevesoltys.seedvault            I  Start restore with [@pm@, com.android.rkpdapp, com.caydey.ffshare, com.gianlu.aria2android, com.android.traceur, org.woheller69.audio_analyzer_for_android]
...
2023-11-14 12:37:40.245 14334-14349 RestoreCoordinator      com.stevesoltys.seedvault            I  Start restore with [@pm@, org.xcsoar, deltazero.amarok.foss, com.android.settings, com.android.htmlviewer]

Somehow for each of those it is getting all restore sets, skipping over the one it can't decrypt. Why is it doing a @pm@ restore for each set? Is this needed? What is causing it to re-load restore sets for each chunk?

grote commented 7 months ago

The error I got in the beginning may be a bug unrelated to this PR, us starting the restore session twice:

2023-11-14 12:53:35.697 16771-16812 RestoreViewModel        com.stevesoltys.seedvault            D  Start InstallResult Flow
2023-11-14 12:53:35.713 16771-16812 RestoreViewModel        com.stevesoltys.seedvault            D  Starting new restore session to restore backup 1699970940126
2023-11-14 12:53:35.714  1522-1625  BackupManagerService    pid-1522                             V  [UserID:0] beginRestoreSession: pkg=null transport=com.stevesoltys.seedvault.transport.ConfigurableBackupTransport
2023-11-14 12:53:35.716 16771-16813 RestoreViewModel        com.stevesoltys.seedvault            D  Starting new restore session to restore backup 1699970940126
2023-11-14 12:53:35.717  1522-1560  BackupManagerService    pid-1522                             V  [UserID:0] beginRestoreSession: pkg=null transport=com.stevesoltys.seedvault.transport.ConfigurableBackupTransport
2023-11-14 12:53:35.717 16771-16814 RestoreViewModel        com.stevesoltys.seedvault            D  Completed InstallResult Flow
2023-11-14 12:53:35.751 16771-16771 Configurab...ortService com.stevesoltys.seedvault            D  Service created.
2023-11-14 12:53:35.753 16771-16771 Configurab...ortService com.stevesoltys.seedvault            D  Transport bound.
2023-11-14 12:53:35.757  1522-1625  BackupManagerService    pid-1522                             I  [UserID:0] Restore session requested but one already active
2023-11-14 12:53:35.760  1522-1560  BackupManagerService    pid-1522                             V  [UserID:0] Acquired wakelock:*backup*-0-2762
2023-11-14 12:53:35.760 16771-16812 RestoreViewModel        com.stevesoltys.seedvault            E  Error starting new session
                                                                                                    android.os.RemoteException: beginRestoreSessionForUser returned null
                                                                                                        at com.stevesoltys.seedvault.restore.RestoreViewModel.getOrStartSession(RestoreViewModel.kt:132)
                                                                                                        at com.stevesoltys.seedvault.restore.RestoreViewModel.startRestore(RestoreViewModel.kt:199)
                                                                                                        at com.stevesoltys.seedvault.restore.RestoreViewModel.access$startRestore(RestoreViewModel.kt:74)
                                                                                                        at com.stevesoltys.seedvault.restore.RestoreViewModel$onNextClickedAfterInstallingApps$1.invokeSuspend(RestoreViewModel.kt:183)
                                                                                                        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
                                                                                                        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
                                                                                                        at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:42)
                                                                                                        at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
                                                                                                        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
                                                                                                        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
                                                                                                        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
                                                                                                        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)
grote commented 7 months ago

For some reason, the backup metadata isn't cached, so in RestoreCoordinator#startRestore(), we end up calling getAvailableMetadata() for all chunks.

Given this and the fact that each chunks also does a @pm@ restore, we should probably increase the chunk size as much as can be considered safe.

grote commented 7 months ago

The optimization for not having to re-fetch the backup metadata was to call restoreCoordinator.beforeStartRestore(restorableBackup.backupMetadata) which is only done once in this PR, not before each start.

@stevesoltys are you still following? So action items I see: