sqlcipher / android-database-sqlcipher

Android SQLite API based on SQLCipher
https://www.zetetic.net/sqlcipher/sqlcipher-for-android/
Other
2.73k stars 564 forks source link

Fixed ICU extraction for multiple processes. #140

Closed ph4r05 closed 8 years ago

ph4r05 commented 9 years ago

Fixes segfault in the following scenario: two Android processes launched during application start tried to extract ICU file at the same time overwriting destination file by each other. This results in segmentation fault while loading ICU in native library.

The segmentation fault ocurred in checkDataItem function at external/icu4c/common/udata.cpp. In particular on line with "pHeader->dataHeader.magic1==0xda".

How to reproduce a problem:

This proposed patch fixes the problem by: 1) using file locking so only one process extracts ZIP archive and writes the destination in time; 2) extracting to a temporary file, then renaming it to the destination file name; 3) forcing flushing/syncing file to file system using channels and sync so native library can process it right after the extraction routine finishes.

developernotes commented 9 years ago

Hello @ph4r05

Thank you for submitting a pull request. After reviewing the scenario and submitted code, I'm curious if you think this could be pushed up into the application codebase instead of the library itself? This specific race condition has never been reported before. We welcome your feedback, thanks!

brodybits commented 9 years ago

@ph4r05 I would ask a different question: why do we have the clash (if /system/usr/icu/icudt46l.dat does not exist) in the first place? I suggest we start to log the path values and investigate. I can take a look this coming weekend, no promises though.

ph4r05 commented 9 years ago

@developernotes Regarding the application codebase. It is hard to say for me since I am using SQLCipher for Android at the moment (no other platforms) and I am not aware of the other affected platforms by this issue. I would say it is platform specific problem.

@brodybits In my opinion it is not a path clash literally. In my scenario it is the same application, but two different processes in the given app uses sqlcipher in parallel. It makes sense to have only one copy of icudt46l.dat for the whole application for me, processes can share it, but only one can create it. If I understand you correctly.

developernotes commented 9 years ago

Hi @ph4r05

Regarding the application codebase. It is hard to say for me since I am using SQLCipher for Android at the moment (no other platforms) and I am not aware of the other affected platforms by this issue.

In this case, I mean pushing the locking check within your application itself during your call to SQLiteDatabase.loadLibs(…);, as appose to the Java-based client library of SQLCipher for Android? This could be shared logic between the two services.

brodybits commented 9 years ago

@ph4r05 I have the following points. It would help the others if you could clearly describe the "use case" of what you need to start multiple processes for.

The old version of loadICUData() would only extract icudt46l.dat from zip if the file does not already exist and it looks like you preserved that functionality. However, if icudt46l.dat does exist after you acquire the lock, it looks like you do not release the lock before returning.

I am also wondering if we need the temporary file while extracting icudt46l.dat, now that we are using a lock?

ph4r05 commented 9 years ago

@developernotes Yes this also makes perfect sense. It could be done in this way. I just wanted to share my problem & solution because investigation was not that straightforward, it just segfaulted without any obvious reason. But application itself "does not know" that there happens ZIP extraction in the background under some circumstances (e.g., system wide file is not present). In this case I would suggest extracting this initialization functionality to a separate method.

@brodybits The lock is eventually released in finally block. Use case is for example this:

  1. process is a master service handling requests from the application. But service is using JNI code a lot and from time to time there may be a segfault (as in this case). so here I have a second process:
  2. watchdog process. Is isolated from segfaults that may happen in master service. But also uses sqlcipher storage for its own purpose. It can backup the master service and help to recover important state information. This design proved to work well if a native code is not 100% bugfree.

Regarding the temporary files - It was my first solution before I implemented the locking. I am not sure whether the locking support on file systems is 100% reliable on all android platforms and FS (e.g., not provided at all). It is just another partial solution to the problem if the lock implementation does not work as expected (some weird extreme cases).

But assuming the lock mechanism is not working correctly, the icuLockAcquire function would have to be redesigned because of the try block in while loop. This try block handles EAGAIN exceptions (another process holds this lock). I admit the catch block is too general, but EAGAIN is wrapped in IOException, as far as I remember so more complex error handling code would be required.

Code could be simplified - temporary file is not required if locking works as expected.

UPDATE: I hope my point is understandable. The old loadICUData() does the extraction if file does not exist, yes. But this whole functionality is not atomic. If two processes start approximately in the same time start this check, both would pass the exists() condition, because unziping takes some time. In that time they end up by rewriting each others file.

brodybits commented 9 years ago

service is using JNI code a lot and from time to time there may be a segfault (as in this case). so here I have a second process:

This doesn't sound very good. Can we investigate this more to determine what is causing the crashes?

ph4r05 commented 9 years ago

@brodybits Probably we misunderstood each other. My application is using plenty of other native code, not only yours (sqlcipher). SQLCipher itself works fine, without segfaults. Just the first start of the application caused segfault because of the ZIP extraction overwrite problem. We diverted a bit from the original point - SQLCipher segfault in my scenario.

brodybits commented 9 years ago

@ph4r05 first let me make a very important correction: sqlcipher is not "my" code at all! I did make a few contributions over the past couple years and have been a participant, nothing more. I have integrated both SQLite and SQLCipher with a Cordova/PhoneGap plugin, which is what actually triggered my level of involvement. I also contributed a couple fixes to external/Android.mk to get the ICU integration actually working (see #134). Credit goes to @developernotes, @sjlombardo, and others at https://github.com/orgs/sqlcipher/people.

ph4r05 commented 9 years ago

@brodybits OK sure. I was not specific enough, sorry for that.

brodybits commented 9 years ago

@ph4r05 no worries then. So I find the first step, which is to identify the actual problem/issue and requirements to be well-done.

In your response to @developernotes I like the idea to load the ICU data file in a separate function and am wondering if we should make this optional (considering that it wasn't really working right in the past)?

brodybits commented 9 years ago

but to load the ICU data file in a separate function breaks the API and I think to make it optional would increase the complexity, if it is even possible. Simplest to just let the application do the locking on SQLiteDatabase.loadLibs(…) as suggested by @developernotes.

ph4r05 commented 9 years ago

@brodybits Breaking API is not a good thing, sure and separating ICU extraction to a separate method was probably not a best idea.

It depends on the point of view. In my case, I prefer using fix proposed in pull request in the sqlcipher.jar, instead of forcing process-wide synchronization in my services, because I consider the ZIP extraction as an implementation detail of the SQLCipher so I find it easier to fix one place that actually causes the error instead of using workaround on a multiple places (loadLibs).

But you can also say in documentation that loadLibs() performs extraction and is unsafe to run in from different processes simultaneously (I am not sure whether current documentation mentions this or not).

developernotes commented 9 years ago

Hi @ph4r05

Circling back on this issue, we've made some changes around how ICU data is extracted. I'm curious if you continue to see errors with the code that is in master?