Closed yawlfoundation closed 9 years ago
I could not replicate the first problem - in my tests, all the orphaned
workitems are removed from workqueues when the case completes. But the first
may be a side effect of the second:
The NotSerializableException indicates that your tomcat configuration setting
has not been done. In tomcat/conf/context.xml, the line "<Manager pathname=""
/>" needs to be uncommented. Please see chapter 2 of the user manual for more
details.
Please advise if fixing this configuration setting fixes both problems.
Original comment by yawl.mic...@gmail.com
on 2 Aug 2010 at 10:50
Hmmm, this is a strange one.
Yes, it's likely that the original serialisation errors were due to a dodgy
context.xml, but I'm still getting the issue (without the exceptions) with a
doubly-checked-as-correct build (rev 1551 2.1 final source).
However, it's not consistent whether any get orphaned, and which ones get
orphaned. I attach the (complicated) spec. used, and also attach a screenshot
of the main flow. I just re-ran it about 8 times. Got 1 orphaned task the first
time, but then none thereafter. Maybe it only manifests when running the first
time after a YAWL startup (I'll see if that pattern reoccurs)?
The spec. is a pain to run manually (in my case, I'm processing everything
programmatically via my observer gateway). If you look at it, it has 2
sub-nets. In the original run captured in this bug, the work item orphaned is
actually the last one in the root net. In the error run I just did, it's the
parent of the multiple-atomic task (second in the root net flow), which
deliberately has a continuation threshold (2) less than the number of instances
(3) the way I run it. I'm also 90% sure that I had orphaned sub-net tasks
instead in other runs.
The logs show that the orphaned tasks which *don't* get left in the GUI are
getting picked up; e.g.,
[WARN] 2010-08-03 10:34:59,762 org.yawlfoundation.yawl.engine.YNetRunner -
Although Net [MarketOperation] of case [11] has completed, there are still
tokens remaining in the net, within these elements:
[Condition:c{DummySubNet2_143_GenDataDelayed2_174}], which usually indicates
that the net is unsound. Those tokens were removed when the net completed.
Where to from here? If it helps, I can run YAWL at DEBUG level and add engine
logs for a run when the error occurs. Useful?
Original comment by monsieur...@gmail.com
on 3 Aug 2010 at 9:46
Attachments:
This might also help.
My code is doing a YEngine getWorkItemsWithIdentifier for the case to check
what work items are active during the run. In all cases, the
eventually-UI-orphaned work items are correctly shown as missing here. So it
would *appear* that the Engine state is always correct; it is the resource
service which is getting out of sync somehow.
Original comment by monsieur...@gmail.com
on 3 Aug 2010 at 9:54
OK, re-ran on clean rev 1556 build. Yes, orphans seem to always occur on first
run after YAWL startup, and not any subsequent time. Weird.
This time, one subnet task (GenDataDummy3a) and one root net task
(GenDataDelayed2) got orphaned in the UI. Interestingly wrt comment 3, the
orphaned tasks are always *missing* from the messages like below:
[WARN] 2010-08-03 11:36:31,046 org.yawlfoundation.yawl.engine.YNetRunner -
Although Net [DummySubNet] of case [2.7] has completed, there are still tokens
remaining in the net, within these elements: [AtomicTask:GenDataDummy3b_89],
which usually indicates that the net is unsound. Those tokens were removed when
the net completed.
So getWorkItemsWithIdentifier is consistent, but looks like YNetRunner isn't
picking up the stranded tasks which end up orphaned in the UI at some level.
Could this be a multi-threading issue? Because everything is automated for me,
the check-ins and check-outs are obviously occurring very quickly, so more
chance of timing issues appearing.
Original comment by monsieur...@gmail.com
on 3 Aug 2010 at 10:43
It almost certainly is a threading/timing issue. I can't replicate it via usual
service interactions, which indicates the faster ObserverGateway communications
are triggering it. The problem is in peeling back the layers to get to the real
cause.
I've made a couple of small changes to YNetRunner in an attempt to 'divide and
conquer' (rev 1597). Stuart, would you mind updating to the latest revision and
running it through to see if it made any difference?
Original comment by yawl.mic...@gmail.com
on 13 Aug 2010 at 2:28
OK, reran with build 1597 (only at INFO level for now). The engine log now
shows InterfaceX errors (log attached). If you want a DEBUG level run (and/or
other log files, let me know; no other log files showed any exceptions or any
obvious strange msgs).
However, got 1 orphaned UI entry this time and it isn't any of the ones
mentioned in the log file (it's 2:RequestGenData_10).
Original comment by monsieur...@gmail.com
on 17 Aug 2010 at 10:48
Attachments:
RE: interface X - in the worklet service's web.xml change the context-param
'EnableExceptionHandling' to false.
RE: the original problem - actually, a debug level run might be useful, thanks
Original comment by yawl.mic...@gmail.com
on 17 Aug 2010 at 11:55
OK, rerun at DEBUG level and with worklet svc. exception handling turned off.
Engine log for two runs provided.
Run 1: 3:GenDataDummy1b_41 orphaned
Run 2: 4:RequestGenData_10 and 4:GenDataDummy1b_41 orphaned
Run 2 log also includes YAWL restarting after a post-run-1 shutdown.
Original comment by monsieur...@gmail.com
on 18 Aug 2010 at 10:12
Attachments:
Note that I've just had this same problem using specs. with fully sound nets
(i.e. they can't have active work items when the end condition is reached). So
it appears that it's nothing to do with the specifics of such scenarios. (As
discussed earlier, the logs show that these are all being detected OK anyway.)
I was running with two specs., one (A) of which has a simple (1 task) subnet,
but the other (B) is just two serial tasks. Again, I'm running observer GW code
that does work item processing very quickly. Spec A is run twice; spec B runs 7
or 8 times in immediate succession.
Got 2 work items 'orphaned' in the UI from the first instance of A, and 1 from
one instance of B.
No DEBUG logs available; standard ERROR ones show nothing.
Original comment by monsieur...@gmail.com
on 31 Aug 2010 at 2:41
Interface B event delivery and handling has been completely reworked for 2.2,
following similar replicateable problems launching large numbers of automated
cases. The changes successfully removed the problem.
Original comment by yawl.mic...@gmail.com
on 5 Aug 2011 at 2:53
Original issue reported on code.google.com by
monsieur...@gmail.com
on 1 Aug 2010 at 10:35Attachments: