yawlfoundation / yawl

Yet Another Workflow Language
http://www.yawlfoundation.org
GNU Lesser General Public License v3.0
90 stars 35 forks source link

Consistent root and subnets 2.1 change can still leave orphaned workitems in the UI #399

Closed yawlfoundation closed 9 years ago

yawlfoundation commented 9 years ago
Root and subnets are now consistent in 2.1 (end when first token reaches output 
condition). If this happens whilst other work items are active, the 'orphaned' 
work items remain in the Control Centre UI admin queue (without the case being 
active) and cause errors (correctly) if trying to start them. Refreshing the UI 
does not remove them. (See two attached screenshots.)

They are cleared out next time YAWL is restarted so this is not a critical 
problem but:

(i) It is confusing for the user. Ideally there should not be work items in the 
queue which aren't startable (and the error msg box doesn't explain to the user 
why it can't be started, though it does in the log).

(ii) I had some serialisation exceptions occurring once YAWL was restarted 
after this start error (see attached log extracts; it appears that somehow the 
ResourceManager itself got associated with the user session and the code 
attempted to serialise it along with other session data (??)). There could 
therefore be some other, more serious issues with such 'orphaned' work items. 
(I think the whole of YAWL failed to start, and this persisted over restarts, 
but seemed to stop happening after a few restarts; I'm not 100% sure why or of 
exactly what actions I took, if any.)

However, I could not consistently reproduce these exceptions, so perhaps this 
was a different problem.

It would therefore be desirable to clear these orphaned work items out of the 
UI as well when the net ended.

But, since unable to reproduce exceptions, set at low priority for now.

Original issue reported on code.google.com by monsieur...@gmail.com on 1 Aug 2010 at 10:35

Attachments:

yawlfoundation commented 9 years ago
I could not replicate the first problem - in my tests, all the orphaned 
workitems are removed from workqueues when the case completes. But the first 
may be a side effect of the second:

The NotSerializableException indicates that your tomcat configuration setting 
has not been done. In tomcat/conf/context.xml, the line "<Manager pathname="" 
/>" needs to be uncommented. Please see chapter 2 of the user manual for more 
details.

Please advise if fixing this configuration setting fixes both problems.

Original comment by yawl.mic...@gmail.com on 2 Aug 2010 at 10:50

yawlfoundation commented 9 years ago
Hmmm, this is a strange one.

Yes, it's likely that the original serialisation errors were due to a dodgy 
context.xml, but I'm still getting the issue (without the exceptions) with a 
doubly-checked-as-correct build (rev 1551 2.1 final source).

However, it's not consistent whether any get orphaned, and which ones get 
orphaned. I attach the (complicated) spec. used, and also attach a screenshot 
of the main flow. I just re-ran it about 8 times. Got 1 orphaned task the first 
time, but then none thereafter. Maybe it only manifests when running the first 
time after a YAWL startup (I'll see if that pattern reoccurs)?

The spec. is a pain to run manually (in my case, I'm processing everything 
programmatically via my observer gateway). If you look at it, it has 2 
sub-nets. In the original run captured in this bug, the work item orphaned is 
actually the last one in the root net. In the error run I just did, it's the 
parent of the multiple-atomic task (second in the root net flow), which 
deliberately has a continuation threshold (2) less than the number of instances 
(3) the way I run it. I'm also 90% sure that I had orphaned sub-net tasks 
instead in other runs.

The logs show that the orphaned tasks which *don't* get left in the GUI are 
getting picked up; e.g.,

[WARN] 2010-08-03 10:34:59,762 org.yawlfoundation.yawl.engine.YNetRunner - 
Although Net [MarketOperation] of case [11] has completed, there are still 
tokens remaining in the net, within these elements: 
[Condition:c{DummySubNet2_143_GenDataDelayed2_174}], which usually indicates 
that the net is unsound. Those tokens were removed when the net completed.

Where to from here? If it helps, I can run YAWL at DEBUG level and add engine 
logs for a run when the error occurs. Useful?

Original comment by monsieur...@gmail.com on 3 Aug 2010 at 9:46

Attachments:

yawlfoundation commented 9 years ago
This might also help.

My code is doing a YEngine getWorkItemsWithIdentifier for the case to check 
what work items are active during the run. In all cases, the 
eventually-UI-orphaned work items are correctly shown as missing here. So it 
would *appear* that the Engine state is always correct; it is the resource 
service which is getting out of sync somehow.

Original comment by monsieur...@gmail.com on 3 Aug 2010 at 9:54

yawlfoundation commented 9 years ago
OK, re-ran on clean rev 1556 build. Yes, orphans seem to always occur on first 
run after YAWL startup, and not any subsequent time. Weird.

This time, one subnet task (GenDataDummy3a) and one root net task 
(GenDataDelayed2) got orphaned in the UI. Interestingly wrt comment 3, the 
orphaned tasks are always *missing* from the messages like below:

[WARN] 2010-08-03 11:36:31,046 org.yawlfoundation.yawl.engine.YNetRunner - 
Although Net [DummySubNet] of case [2.7] has completed, there are still tokens 
remaining in the net, within these elements: [AtomicTask:GenDataDummy3b_89], 
which usually indicates that the net is unsound. Those tokens were removed when 
the net completed.

So getWorkItemsWithIdentifier is consistent, but looks like YNetRunner isn't 
picking up the stranded tasks which end up orphaned in the UI at some level.

Could this be a multi-threading issue? Because everything is automated for me, 
the check-ins and check-outs are obviously occurring very quickly, so more 
chance of timing issues appearing. 

Original comment by monsieur...@gmail.com on 3 Aug 2010 at 10:43

yawlfoundation commented 9 years ago
It almost certainly is a threading/timing issue. I can't replicate it via usual 
service interactions, which indicates the faster ObserverGateway communications 
are triggering it. The problem is in peeling back the layers to get to the real 
cause.

I've made a couple of small changes to YNetRunner in an attempt to 'divide and 
conquer' (rev 1597). Stuart, would you mind updating to the latest revision and 
running it through to see if it made any difference?

Original comment by yawl.mic...@gmail.com on 13 Aug 2010 at 2:28

yawlfoundation commented 9 years ago
OK, reran with build 1597 (only at INFO level for now). The engine log now 
shows InterfaceX errors (log attached). If you want a DEBUG level run (and/or 
other log files, let me know; no other log files showed any exceptions or any 
obvious strange msgs).

However, got 1 orphaned UI entry this time and it isn't any of the ones 
mentioned in the log file (it's 2:RequestGenData_10).

Original comment by monsieur...@gmail.com on 17 Aug 2010 at 10:48

Attachments:

yawlfoundation commented 9 years ago
RE: interface X - in the worklet service's web.xml change the context-param 
'EnableExceptionHandling' to false.

RE: the original problem - actually, a debug level run might be useful, thanks

Original comment by yawl.mic...@gmail.com on 17 Aug 2010 at 11:55

yawlfoundation commented 9 years ago
OK, rerun at DEBUG level and with worklet svc. exception handling turned off. 
Engine log for two runs provided.

Run 1: 3:GenDataDummy1b_41 orphaned
Run 2: 4:RequestGenData_10 and 4:GenDataDummy1b_41 orphaned

Run 2 log also includes YAWL restarting after a post-run-1 shutdown.

Original comment by monsieur...@gmail.com on 18 Aug 2010 at 10:12

Attachments:

yawlfoundation commented 9 years ago
Note that I've just had this same problem using specs. with fully sound nets 
(i.e. they can't have active work items when the end condition is reached). So 
it appears that it's nothing to do with the specifics of such scenarios. (As 
discussed earlier, the logs show that these are all being detected OK anyway.)

I was running with two specs., one (A) of which has a simple (1 task) subnet, 
but the other (B) is just two serial tasks. Again, I'm running observer GW code 
that does work item processing very quickly. Spec A is run twice; spec B runs 7 
or 8 times in immediate succession.

Got 2 work items 'orphaned' in the UI from the first instance of A, and 1 from 
one instance of B.

No DEBUG logs available; standard ERROR ones show nothing.

Original comment by monsieur...@gmail.com on 31 Aug 2010 at 2:41

yawlfoundation commented 9 years ago
Interface B event delivery and handling has been completely reworked for 2.2, 
following similar replicateable problems launching large numbers of automated 
cases. The changes successfully removed the problem.  

Original comment by yawl.mic...@gmail.com on 5 Aug 2011 at 2:53