io7m-com / exfilac

Simple, reliable, automated S3 uploads for Android
https://www.io7m.com/software/exfilac
ISC License
10 stars 1 forks source link

Silent upload failures on Graphene #16

Closed io7m closed 1 month ago

io7m commented 2 months ago

On a Pixel 8a running Graphene, uploads triggered on a time schedule seem to be failing. It seems to be an SQLite issue:

com.io7m.darco.api.DDatabaseException: [SQLITE_BUSY] The database file is locked (database is locked)
        at com.io7m.darco.api.DDatabaseQueryAbstract.execute(DDatabaseQueryAbstract.java:56)
        at com.io7m.exfilac.core.internal.uploads.EFUploadTask.findConfiguration(EFUploadTask.kt:452)
        at com.io7m.exfilac.core.internal.uploads.EFUploadTask.execute(EFUploadTask.kt:100)
        at com.io7m.exfilac.core.internal.uploads.EFUploadService.upload$lambda$4$lambda$3(EFUploadService.kt:100)
        at com.io7m.exfilac.core.internal.uploads.EFUploadService.$r8$lambda$Fl8GHAu2nBw4wY-yf5qZQyAdiY0(Unknown Source:0)
        at com.io7m.exfilac.core.internal.uploads.EFUploadService$$ExternalSyntheticLambda1.run(D8$$SyntheticClass:0)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
        at java.lang.Thread.run(Thread.java:1012)
Caused by: org.sqlite.SQLiteException: [SQLITE_BUSY] The database file is locked (database is locked)
        at org.sqlite.core.DB.newSQLException(DB.java:1179)
        at org.sqlite.core.DB.newSQLException(DB.java:1190)
        at org.sqlite.core.DB.execute(DB.java:985)
        at org.sqlite.jdbc3.JDBC3PreparedStatement.lambda$execute$0$org-sqlite-jdbc3-JDBC3PreparedStatement(JDBC3PreparedStatement.java:59)
        at org.sqlite.jdbc3.JDBC3PreparedStatement$$ExternalSyntheticLambda2.call(D8$$SyntheticClass:0)
        at org.sqlite.jdbc3.JDBC3Statement.withConnectionTimeout(JDBC3Statement.java:454)
        at org.sqlite.jdbc3.JDBC3PreparedStatement.execute(JDBC3PreparedStatement.java:53)
        at com.io7m.exfilac.core.internal.database.EFQUploadRecordUpdate.onExecute(EFQUploadRecordUpdate.java:76)
        at com.io7m.exfilac.core.internal.database.EFQUploadRecordUpdate.onExecute(EFQUploadRecordUpdate.java:28)
        at com.io7m.darco.api.DDatabaseQueryAbstract.execute(DDatabaseQueryAbstract.java:54)
        ... 8 common frames omitted
io7m commented 2 months ago

The problem with this is that, because the tasks can't write to the database, they also can't record the fact that an error occurred.

io7m commented 1 month ago

This seems to be caused by multiple tasks all starting at exactly the same time and trying to acquire a database lock. They also retry in lock step.

This could be mitigated by staggering task startup times, and also retrying after random pauses.

io7m commented 1 month ago

This is still a problem.

io7m commented 1 month ago

The upload method needs to take a parameter that represents a delay to wait before starting. Manually started uploads always have a delay of zero, but time scheduled uploads should be staggered fairly heavily (a random value between a second and a minute).

io7m commented 1 month ago

For what it's worth, this problem does only happen once: When a set of uploads run, a lot of them fail immediately due to the above issue. This then means that their "last completed" time is very different to those uploads that did run to completion. This effectively puts all of the upload tasks out of sync and mean that this problem is averted the next time they run.

Still needs fixing though.

io7m commented 1 month ago

This seems to be done.