servalproject / serval-dna

The Serval Project's core daemon that implements Distributed Numbering Architecture (DNA), MDP, VoMP, Rhizome, MeshMS, etc.
http://servalproject.org
Other
171 stars 80 forks source link

Report Rhizome database errors to command line caller #3

Closed quixotique closed 12 years ago

quixotique commented 12 years ago

See issue #1 for the background.

At present, an error in the rhizome_store_file() function does not appear to be reported as failure by the rhizome add file command. This, and all other potential failures in servald commands used by Batphone MeshMS logic, must be found and fixed.

quixotique commented 12 years ago

Tested by disabling Rhizome retries and repeatedly running the https://github.com/servalproject/batphone/blob/development/tests/meshms1 script. Sometimes it succeeded in sending a MeshMS from one phone to the other, but sometimes it failed because of database locking error.

 I/servald (  675): rhizome_database.c:927:rhizome_store_file()  database is locked on try 1 after 0.000 seconds (0.000 elapsed): DELETE FROM FILES WHERE datavalid=0;
E/servald (  675): rhizome_database.c:927:rhizome_store_file()  query failed, database is locked: DELETE FROM FILES WHERE datavalid=0;
I/servald (  675): rhizome_database.c:951:rhizome_store_file()  database is locked on try 1 after 0.000 seconds (0.000 elapsed): INSERT OR REPLACE INTO FILES(id,data,length,highestpriority,datavalid,inserttime) VALUES('061465A56EAA3452C67C3D9C6B885DA243431B2FB82FCD1382E26133CB392E95A861FE48AA27E5BB6E7F9815B0EA112E42046A19BAF8746F315B7A10466290A1',?,102,0,0,1349742888399);
E/servald (  675): rhizome_database.c:951:rhizome_store_file()  query failed, database is locked: INSERT OR REPLACE INTO FILES(id,data,length,highestpriority,datavalid,inserttime) VALUES('061465A56EAA3452C67C3D9C6B885DA243431B2FB82FCD1382E26133CB392E95A861FE48AA27E5BB6E7F9815B0EA112E42046A19BAF8746F315B7A10466290A1',?,102,0,0,1349742888399);
E/servald (  675): rhizome_database.c:953:rhizome_store_file()  Failed to insert row for fileid=061465A56EAA3452C67C3D9C6B885DA243431B2FB82FCD1382E26133CB392E95A861FE48AA27E5BB6E7F9815B0EA112E42046A19BAF8746F315B7A10466290A1
E/servald (  675): rhizome_database.c:667:rhizome_store_bundle()  Could not store file
E/servald (  675): rhizome.c:362:rhizome_add_manifest()  rhizome_store_bundle() failed.
E/servald (  675): commandline.c:1086:app_rhizome_add_file()  Manifest not added to Rhizome database
E/MeshMS  (  675): cannot process message intent
E/MeshMS  (  675): java.io.IOException
E/MeshMS  (  675):      at org.servalproject.rhizome.Rhizome.sendMessage(Rhizome.java:118)
E/MeshMS  (  675):      at org.servalproject.meshms.OutgoingMeshMS.processSimpleMessage(OutgoingMeshMS.java:135)
E/MeshMS  (  675):      at org.servalproject.meshms.OutgoingMeshMS.onHandleIntent(OutgoingMeshMS.java:61)
E/MeshMS  (  675):      at android.app.IntentService$ServiceHandler.handleMessage(IntentService.java:59)
E/MeshMS  (  675):      at android.os.Handler.dispatchMessage(Handler.java:99)
E/MeshMS  (  675):      at android.os.Looper.loop(Looper.java:130)
E/MeshMS  (  675):      at android.os.HandlerThread.run(HandlerThread.java:60)
E/MeshMS  (  675): Caused by: org.servalproject.servald.ServalDFailureException: exit status indicates failure: org.servalproject.servald.ServalDResult(args=[rhizome, add, file, D66A0FAE286EE455F131367D0F77D8D4A25882AF0E8254A67C0564E6A3A5EE4F, , /data/data/org.servalproject/meshms/send-855045565.payload, /data/data/org.servalproject/meshms/send-418740312.manifest], status=255, outv=[])
E/MeshMS  (  675):      at org.servalproject.servald.ServalD.rhizomeAddFile(ServalD.java:334)
E/MeshMS  (  675):      at org.servalproject.rhizome.Rhizome.sendMessage(Rhizome.java:101)
E/MeshMS  (  675):      ... 6 more

Clearly errors in rhizome_store_file() are being reported as non-zero exit status to the command line and also via JNI. So the Batphone Android app is receiving enough information to either inform the user of the failure or re-try the send, although the built-in retry logic in servald (introduced by fixing issue #2) seems to do the trick for now.

However, retry does not solve but only makes less likely, the bug that a Rhizome DB lock held for long enough by another process can still cause failure to send an MeshMS. The proposed re-architecture of Rhizome into its own server process (see issue #1) should resolve this problem.

Code review of other Rhizome commands shows that errors are being handled correctly and reported.

Closing issue.