vivo-project / VIVO

VIVO is an extensible semantic web application for research discovery and showcasing scholarly work
http://vivoweb.org
BSD 3-Clause "New" or "Revised" License
202 stars 127 forks source link

Long delay when logging in to populated instance #3885

Closed brianjlowe closed 1 year ago

brianjlowe commented 1 year ago

Describe the bug On VIVO 1.13, after submitting email and password, login takes many seconds to complete and redirect on a populated instance. Approximately 5 seconds on an instance with 10 million triples. Thanks to Michael Bentz for originally reporting this issue on Slack.

To Reproduce

  1. Populate an instance with 10+ million triples.
  2. Log in as root or admin: submit email and password.
  3. Observe long delay before you are redirected to Site Admin.

Environment (please complete the following information):

brianjlowe commented 1 year ago

Delay is suspiciously similar to the delay incurred in listing the named graphs in use in a TDB instance.

brianjlowe commented 1 year ago

Indeed, grabbing thread dumps during the delays consistently shows calls to getGraphURIs(), which is very slow on TDB. Not sure yet why this is happening on login/logout and not on other requests.

Login example:

"ajp-nio-127.0.0.1-8009-exec-9" #50 daemon prio=5 os_prio=0 cpu=83850.94ms elapsed=610488.84s tid=0x00007fc1d8be4800 nid=0x3460a5 runnable [0x00007fbfe95e5000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.hash(java.base@11.0.18/HashMap.java:340) at java.util.HashMap.put(java.base@11.0.18/HashMap.java:608) at java.util.HashSet.add(java.base@11.0.18/HashSet.java:220) at org.apache.jena.atlas.iterator.FilterUnique.test(FilterUnique.java:32) at org.apache.jena.atlas.iterator.Iter$1.hasNext(Iter.java:228) at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:1067) at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:347) at org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90) at org.apache.jena.util.iterator.MapFilterIterator.hasNext(MapFilterIterator.java:54)

Logout example: "ajp-nio-127.0.0.1-8009-exec-2" #43 daemon prio=5 os_prio=0 cpu=92412.14ms elapsed=610693.55s tid=0x00007fc1d86f1000 nid=0x34609e runnable [0x00007fbfe9cec000] java.lang.Thread.State: RUNNABLE at org.apache.jena.tdb.base.buffer.RecordBuffer._get(RecordBuffer.java:107) at org.apache.jena.tdb.base.buffer.RecordBuffer.get(RecordBuffer.java:53) at org.apache.jena.tdb.base.recordbuffer.RecordRangeIterator.hasNext(RecordRangeIterator.java:119) at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:347) at org.apache.jena.tdb.sys.DatasetControlMRSW$IteratorCheckNotConcurrent.hasNext(DatasetControlMRSW.java:111) at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:347) at org.apache.jena.atlas.iterator.Iter$1.hasNext(Iter.java:223) at org.apache.jena.atlas.iterator.Iter.hasNext(Iter.java:1067) at org.apache.jena.atlas.iterator.Iter$2.hasNext(Iter.java:347) at org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90) at org.apache.jena.util.iterator.MapFilterIterator.hasNext(MapFilterIterator.java:54)

chenejac commented 1 year ago

Maybe UF sample data can be used to reproduce the issue - https://github.com/vivo-project/sample-data/pull/9

brianjlowe commented 1 year ago

The fact that this not a problem on all requests may be related to the graph URI cache in RDFServiceJena. The changeSetUpdate() method in RDFServiceTDB sets a dirty flag on this cache whenever a change set is written to the triple store. A change is written whenever someone logs in (not sure about logout), because we record login count and timestamp of last login.

It would initially seem that any request immediately following an update or edit should be similarly slow, not just logins.

brianjlowe commented 1 year ago

It seems that we could at least make that cache invalidation a bit smarter because the change set offers easy access to the name graphs it's modifying. If there are only named graphs added, we can add them to the cache directly. Only if there is a named graph involved in a triple removal do we need to invalidate the cache, but even then it would probably be faster to test whether the graph is empty with something like "SELECT ?s ?p ?o WHERE { GRAPH ?g { ?s ?p ?o } } LIMIT 1" or even "ASK { GRAPH ?g { ?s ?p ?o } }" rather than re-listing all the graphs in the store.

brianjlowe commented 1 year ago

Another option might be to rebuild the cache in the background when update activity is quiescent the way the search indexer works. I don't think there are any situations where a request will care that the graph URI list is slightly stale; in older versions of VIVO I remember freezing the graph URI list after startup and the only ill effect was that visiting the Manage Jena Models page didn't show new graphs that had appeared on ingest. Normal requests were unaffected.

It would be nice not to block incoming requests at all on these cache rebuilds.