nmrs-nigeria / openmrs-module-nigeriaemr

NMRS NDR Extration Module for NigeriaMRS
1 stars 15 forks source link

performance_optimization_emeka #113

Closed emadubuko closed 4 years ago

emadubuko commented 4 years ago

The ndr generator throws out of memory when running on database of >10k patients. This module fixes some of the issue identified a. Using singleton object for DatatypeFactory in Utils class and JaxbMarshaller object b. Retrieve only patient ids for patient instead of actual patient object to reduce amount of object kept in memory c. Close all SQL connections

talk2morris commented 4 years ago

Hi emeka, Nice Job, though I have some few comments.

  1. For (a) can u point me to the commit or line of code where u implemented the singleton for DatatypeFactory and JaxbMarshaller.

  2. Also for (b) Did u perform a test on the previous implementation and this new one to compare the performance difference.

  3. For (c) I noticed u were referring to method checkLoggerGlobalProperty. Actually there is no need to close the statement on the catch block since the finally block is already handling that.

I would be expecting your feedback on this.

emadubuko commented 4 years ago

a) for datatypeFactory check Utils at line 860, for JaxbMarshaller, check JaxbMarshaller, instantiation is done once on NdrFragmentController at line 51 and passed around as variable. b) Yes i did, not only is this one faster, very noticable for DBs with > 10k patients, it also solves out of memory exception being thrown after some time c) I wasn't referring to that alone, there is also SQL query to retrieve patient Address.

talk2morris commented 4 years ago

a) I see u made the createMarshaller a static method, though I was expecting to see an implementation of a singleton object as u said. Also for JaxbMarshaller, this has always been like this.

b) This will be tested. I fear the repetitive request made to the DB to retrieve patient. I would suggest some form of dynamism to handle the flow.

emadubuko commented 4 years ago

a) the singleton implementation was for DatatypeFactory. It's too costly re-creating on every call. Found out the hard way. b) the repetitive request is actually insignificant plus it keeps just whats needed on memory unlike retrieving all patient. Openmrs uses Hibernate, retrieving patient retrieves all proxy object and leave them in memory.

talk2morris commented 4 years ago

a) You referred me to Utils at line 860, that was were I checked, and what is there is a static method, maybe u should explain were I can find the singleton object.

b) Also, OpenMRS uses Lazy loading by default, so the proxy objects are not loaded on first call.

emadubuko commented 4 years ago

a) that is a singleton instantiation for the DatatypeFactory object. If you run debug on it you will notice that the object is instantiated only once throughout the execution. That alone accounts for a lot of improvement in speed. I have tested this extensively with multiple datasets. Please do your testing with database > 10k patients and note the time differences. The time saved is in hours!

b) Again this implementation solve alot of memory problems when you run it with database > 10k patients

c) the tool as it is, throws out of memory exceptions for databases > 10k patients

P.S: There are also other null exceptions that have been fixed on the pull request. I cloned the new source code to test on a database, and it throws a whole lot of errors (that i have already fixed by checking for null) and ended up not generating complete data. Have anybody tested the new release on a live database?

talk2morris commented 4 years ago

The module your clone is not the latest(that may account for the null exception that you saw), I am currently working on new additions on the module to export using location, nevertheless, the module on the modules folder is the latest but the source is on dev-new and not master, I am close to release phase when that is done it will be merge to master.

wandechris commented 4 years ago

redundant