Closed ebruchez closed 6 years ago
We discussed today with @avernet that we could also use Ehcache replication.
Steps for a prototype:
Q: Any way to do anything using browser local state?
Tomat has a DeltaManager
session manager which "only replicat[es] the deltas in data". However documentation doesn't stay what that actually means (how does the algorithm work?). It might be by session attribute.
What we store in the session:
oxf.xforms.state.manager.session-listeners-key.$uuid
)SessionDocument(lock: Lock, uuid: String)
via oxf.xforms.state.manager.uuid-key.$uuid
Connection
statefr-language
UploadProgress
oxf.xforms.state.async-submissions.$uuid
deleteFileOnSessionTermination
DynamicResource
via orbeon.resources.dynamic.$digest
What we store indirectly via UUID into caches:
String
DynamicState
Data: initial state for Controls form:
So we are talking about 100KB, mostly of instances.
Need more data of course.
One issue: unless there is a shared filesystem where temporary files, uploads which haven't been saved to a database would be lost when failing over.
This blog post from almost 6 years ago indicates about 120 MB/s with Ehcache via RMI for 60 KB objects, less as you increase object size (and about 50% of that with JGroups).
DynamicState
.Unclear, based on this, whether using Ehcache for Tomcat session replication is possible independently from the rest of Shiro:
EHCache is also a nice choice if you quickly need container-independent session clustering. You can transparently plug in TerraCotta behind EHCache and have a container-independent clustered session cache. No more worrying about Tomcat, JBoss, Jetty, WebSphere or WebLogic specific session clustering ever again!
It seems that, maybe, Shiro creates its own sessions and you wouldn't use web sessions at all.
We have a few options:
So I think that option 2 might be the best so far.
I agree. Also, with the "everything in the session" option, where would we store what we can't afford to keep in memory, and right now serialize to disk through Ehcache? It seems to me that anyway, we can't keep everything in the session.
XFormsContainingDocument
in XFormsDocumentCache
must also store their state in the Ehcache sessionCredentials
in native session should be replicatedMap
[CHECK: not using a Map
now but individual keys]XFormsServerSharedInstancesCache
stores URI with port, which fails when loading missing instancexforms.state
cache must be initialized early so it can start replicating even before there is a first request coming inpersistence-model.xml
DynamicState
so we should be good.isPE()
ehcache.xml
configuration
maxElementsInMemory
xforms.resources
xforms.xbl
cache doesn't seem to be used!clearMissingUnsavedDataAttachmentReturnFilenamesJava
in Form Builder after replication, seems that data
is null
(replace with `xxf:instance())SessionListeners
class uses custom serialization/deserialization, and deserializes to an empty list of listeners.xforms.xbl
is in fact not used an remove if that's the caseQuestion raised by @avernet: if we do not store anything in the actual servlet session anymore, could we not use/check the session at all? Could this help with occasional issues users have with sessions in particular in embedded settings?
One idea is that the XForms document's UUID could be sufficient.
There are are few things where a setting by user can make sense, such as fr-language
. It might be, at this point, the only setting shared per session.
We currently have a 2-level XForms document cache/store:
XFormsContainingDocument
EhcacheStateStore
, which creates an instance of DynamicState
. This maps to the xforms.state
cache, which is configured as a disk cache. Upon serializing to disk, Java serialization apples to DynamicState
as it is a Scala case class
which is a java.io.Serializable
.Given this, replication causes an issue, which is that after each Ajax request, the new state must be propagated.
Possibilities:
DynamicState
after each request
(BTW as discussed with @avernet state serialization is likely much cheaper than deserialization, as deserialization entails recreating and initializing an XFormsContainingDocument
including RRR and creating the whole tree of controls, evaluating lots of XPath expressions, etc.)
To clarify, it's not entirely true we don't need to replicate sessions even if we store everything in Ehcache: session replication needs to be enabled for the servlet containers, otherwise the container won't be able to accept the incoming session cookie from the client, and we won't have a session id at our disposal. However, no session objects will have to be replicated in this case, except maybe fr-language
or secondary user settings like this.
The above, of course, if we keep checking the session at all. If we don't, then from Orbeon Forms we wouldn't even need to replicate the session at all, and a purely servlet session-less operation would be possible.
In practice, I would imagine that session would be needed by most deployments, be it only to handle logged in users, unless some third-party solutions like Apache Shiro are used.
Defer:
We need 2 caches:
Both need to be replicated the same way.
One question is that, since we do need session replication after all, should we use Ehcache for a session cache? What is the benefit?
One thing is that, for the XForms dynamic state, we do passivation to disk. Now we could use the servlet session, and configure that to do passivation as well, although that would apply to the entire content of the user's session, and the configuration would be different from app server to app server.
Only being able to passivate the whole session, and instead of being able to do it a document-by-document-basis, seems like a step back. Imagine you have fairly active users, that load many forms, over hours or work. In such a scenario, you would keep in memory the dynamic state for all the "old forms" for those users, and would require much more RAM for the same amount of load.
Do containers, say Tomcat, have the ability to only passivate part of the session for a given user?
Do containers, say Tomcat, have the ability to only passivate part of the session for a given user?
I doubt it, Tomcat is pretty basic.
I agree that just session passivation would be a step back.
Summary so far:
xforms.state
cache.xforms.state
cache can keep being a disk store only.Could you detail what you mean by "obtain a unique id to access other caches"?
Regarding "allow sticky sessions to work at the load balancer level", I don't think load balancer need the app server to maintain their own cookie. E.g. HAProxy can keep track of the server for a user by rewriting a given cookie, like JSESSIONID
, doing 123
→ s1~123
on responses, and s1~123
→ 123
on requests, but it can also use its own cookie, e.g. SERVERID=s1
.
Found out why I get an "unexpected request sequence number" when moving back to server A. It's because server A still has a live document around. It doesn't know that server B processed some requests.
One solution might be, in case of sequence number error, to check if we can find one in the store, as that might have been updated in the background.
Terminology from HAProxy doc:
Using persistence, we mean that we’re 100% sure that a user will get redirected to a single server. Using affinity, we mean that the user may be redirected to the same server…
Solution for problem above:
Element.version
to store the sequence number.XFormsDocumentCache
, invalidate found entry if there is a newer sequence number in the store checked simply by looking at the element's version
in cache, which does not require extracting the dynamic state to find its key (but does require accessing the Element
.So at this point we have shown that replication can work provided:
xforms.state
XFormsDocumentCache
cache invalidationXFormsServerSharedInstancesCache
DynamicState
after every Ajax updateAt this point, we don't yet know the performance implications for:
DynamicState
creation and serializationDynamicState
to peers DynamicState
in peers (using a disk cache)So we would need some numbers.
storeDocumentState
takes about 5 ms (average of 100 serializations) for form with:
The size is 53,732 bytes, out of which 46,290 are for instances, which breaks down to:
fr-initial-instance
(see #3303)fr-form-instance
fr-error-summary-instance
fr-initial-instance
is for keeping "initial data for clear button". It never changes after the initial load, so this would be a case where if we split the dynamic state into more than one cache key, we could prevent unneeded serializations and replication.
Editing that same form in Form Builder, storeDocumentState
:
fb-form-instance
We should maybe consider disabling replication for Form Builder, for most deployment uses, as here replication might be more expensive. Seeing how we could reduce the size of serialized instances would be a good idea too. We should do space compression on instances (see #2751, #1715).
What about gzip compression of serialized data?
Dynamic state bytes compresses well by a factor of 7-8. For example:
Not sure why I have 20 ms above vs. 15 ms in my previous test the day before!
Q: Should we, and if so when, compress dynamic state when putting it into Ehcache? Is the tradeoff worth it?
Property:
<property
as="xs:boolean"
name="oxf.xforms.replication"
value="true"/>
One question is: when should the caches be initialized? We would like replication can happen this at the earliest. The OrbeonServletContextListener
is a possibility. But this is not XForms-aware. So we should probably introduce a new ServletContextListener
for this purpose and add it to web.xml
. Suggesting org.orbeon.oxf.xforms.ReplicationServletContextListener
.
XFormsServerSharedInstancesCache
is used by Form Runner and Form Builder to cache resources, in particular form resources. Those loaded from oxf:
are no problem, but those loaded with for example fr-get-fr-resources
go through a local service.
What we should do is, if the URL is relative to the current host/port/context, store a path to be rewritten instead of the absolute URL. This way, when the path migrates to another server, it will be resolved again but against the current host.
Uploads are an issue. If an upload is in progress when a server goes down, then it will return an error, which is probably acceptable in most cases as the user will see the error and restart the upload. At least, I think this is ok in a first implementation.
If a server goes down after an upload has completed, then the form instance data points to a temporary file on disk. Unless the replica servers share the temp filesystem with the original server, the replica will now contain instance data with a temporary file URL which points to a missing file. When saving the data to the database, an error will probably happen.
So what should we do in this case?
I think that the sponsor for this feature doesn't need attachments at first. But we should probably at least do solution 2 above. We could make this part of require-uploads
. This calls pending-uploads
(tryPendingUploads
). Either tryPendingUploads
or another Form Runner action could check the status of unsaved attachment.
If unsaved attachments have missing files:
Another possibility could be to have a standard validation for attachments which checks the existence of the associated file. If the file disappears, the control would show in error. But it could be costly to check all attachments all the time without optimization. With replication, this would only be needed upon state restoration.
In #1067 we discussed an xxforms-state-restored
event which, possibly, could be also used to check attachments. (#3317)
For uploads, I have implemented a simple solution to detect missing attachments and show an error.
java.io.NotSerializableException: org.orbeon.oxf.xforms.state.XFormsStateManager
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at org.apache.catalina.ha.session.DeltaRequest$AttributeInfo.writeExternal(DeltaRequest.java:401)
at org.apache.catalina.ha.session.DeltaRequest.writeExternal(DeltaRequest.java:294)
at org.apache.catalina.ha.session.DeltaRequest.serialize(DeltaRequest.java:308)
at org.apache.catalina.ha.session.DeltaManager.serializeDeltaRequest(DeltaManager.java:585)
at org.apache.catalina.ha.session.DeltaManager.requestCompleted(DeltaManager.java:966)
at org.apache.catalina.ha.session.DeltaManager.requestCompleted(DeltaManager.java:933)
at org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:525)
at org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:513)
at org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:495)
at org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:406)
at org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:329)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:502)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1132)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2458)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
New problem: I do manage one move from one server to the other. However if the session is moved back to the first server, I am getting an error:
Unable to retrieve XForms engine state. Unable to process incoming request.
It is as if, during cache bootstrap, the content was not obtained. Yet RMICacheManagerPeerListener
appears to find its peer.
Adding this is necessary:
<bootstrapCacheLoaderFactory
class="net.sf.ehcache.distribution.RMIBootstrapCacheLoaderFactory"
properties="bootstrapAsynchronously=false" />
It would be great if we could write an automated test for this. This might justify the use of Docker instances. Here is an example test plan:
s1
and s2
form1
into one context form2
form1
and form2
s1
containerform1
form2
(must obviously work)s1
s1
is "up" (how?)form1
and form2
s2
form1
and form2
This can be extended to launching more than 2 instances and more clients.
Now we need to figure out how to run this, because there is interleaving of code on the server-side (launching/stopping containers) and code on the client-side (loading pages).
So there will be some code, somewhere, running the test above, most likely on the client. If on the client, it will have to remote-control the start/stop of containers.
This calls for having a small server which the client can remote control to start containers. I see these possibilities:
orbeon-war
. This means that at least one container with Orbeon Forms must be started before this works. Also, this requires the entire orbeon-war
to be ready.I think that option 2 would be cleaner. It is a single server/remote control which only needs to be started once and is independent from the specific tests that need to run.
So we implemented option 2 above for tests, but now thinking that we should use Node.js's child_process
directly instead.
Current status is that we now have a test able to start/stop containers, with HAProxy, load forms and change state, shut down one server, make sure that forms initially hitting the server down works, and conversely that the state goes back to the original server when it is back up, and then this also works reversed.
We documented a possibility where state might be lost if it hasn't been replicated yet and the client got an Ajax response. Possible solutions:
replicateAsynchronously=false
With testing locally (Docker images on the same machine), asynchronous state storing takes, from the caller's point of view, in the order of 10 ms. With replicateAsynchronously=false
, using Form Builder as "form", this time climbs to 30-40 ms.
Documented but could benefit from diagrams.
Current status
As of 2011-09-26, the XForms engine session cannot be replicated because:
Implementing efficient session replication
Externalizable
to serialize efficiently when requestedsetAttribute
orremoveAttribute
used by session replication code to detect, so might not be even useful at all! See Tomcat doc.Other questions/issues
External resources
+1 from customer