mhgrove / Empire

JPA implementation for RDF
http://groups.google.com/group/empire-rdf/
Apache License 2.0
98 stars 34 forks source link

Performance issues with lazy loading #114

Closed mkrech10 closed 8 years ago

mkrech10 commented 8 years ago

I seem to be having performance issues when pulling in a list of objects (Frameworks) and each of these objects has children, grand-children, etc. Currently the levels don't go too deep, but maybe like 8-10 levels deep and maybe 50 Frameworks. It is taking a long time for just the list of parent frameworks to be retrieved. I know that proxy objects are created for each linked object so Empire knows where/how to retrieve the object when needed.

Is there a configuration to help speed this up? Could I be using it wrong?

mhgrove commented 8 years ago

Each retrieval would be a round trip to the server to get each object. You can specify that they should be lazily loaded via your annotations which might help with the perceived performance.

mkrech10 commented 8 years ago

Sorry, I should have said that in the initial comment. We are currently using fetch = FetchType.LAZY

mhgrove commented 8 years ago

how long does it take to retrieve the list of parent objects? what database are you using?

mkrech10 commented 8 years ago

We are currently using Stardog. There are almost 1000 parents. I ran a quick comparison test. 5 find alls with using empire and five using a manual sparql query 191,925ms 169,967ms 192,158ms 191,791ms 183,041ms

When running with just sparql we got: 312ms 187ms 187ms 177ms 179ms

Here is the find all method that used empire

List frameworks = [] Query findAllFrameworks = entityManager.createQuery("where { ?result rdf:type <$UnifiedFrameworkOntology.BASE_URI#Framework> }") findAllFrameworks.setHint(RdfQuery.HINT_ENTITY_CLASS, Framework) List results = findAllFrameworks.resultList results.each { frameworks.add((Framework)it) }

mhgrove commented 8 years ago

if you attach visualvm's profiler to the process where you're going through empire, what are the hotspots?

mkrech10 commented 8 years ago

Let me get that set up and I'll get back to you.

mkrech10 commented 8 years ago

I cannot get VisualVM to connect to my application server currently. I am working on it. I have the VisualVM plugin for IntelliJ but yet I keep getting the errors below.

image

image

UPDATE: I was looking into the wrong %tmp% directory. I went to c:\users\AppData\Local\Temp and was able to find the hsperfdata_UserName directory and remove it

mkrech10 commented 8 years ago

I have added my nps file from visualVM for reference if you'd like

The bulk of the time are in com.clarkparsia.empire.annotation.RdfGenerator.fromRdf() and in com.clarkparsia.empire.annotation.RdfGenerator.determineClass()

The inital call to get all the frameworks works rather quickly. It's only when we iterate through the result list to build a List of Frameworks that kills our performance. Each framework object is placed into the list, but it appears as though each children's proxy object is created at this point.

The properties of a Framework are id, name, and list of TaskItem. All we want in the findAllFrameworks is to return the id and name. We do not need any of the TaskItems. And when we are iterating through the returned result set, we are not asking for any specific properties, just frameworkList.add(framework from iterator).

FindAllFrameworks.nps.txt

mhgrove commented 8 years ago

fromRdf and determineClass are the hotspots because both of them round-trip to the database and would be called often when the list of Framework objects is built.

I'm not sure why it would eagerly load the task items for all 1000 frameworks, are all the relevant properties annotated w/ lazy fetch type?

You could always create a parent interface that has just the id and name of the Framework and you could load the frameworks as those.

mkrech10 commented 8 years ago

The issue we were having was not related to eager/lazy loading directly. Our code is written in Groovy. When using Groovy we had to set all related object to be created as 'private', rather than default. This then provided the behavior that we expected when setting the fetch type to lazy.

jblaufuss commented 8 years ago

@mhgrove this issue can probably be closed. @mkrech10 is on my team and we agreed that the other issues I submitted today probably covers what we were seeing with more specificity.