marcesher / cfconcurrent

A boilerplate-reduction library for using the Java Concurrency Framework in ColdFusion applications
31 stars 9 forks source link

CFConcurrent on Lucee #3

Open jjannek opened 9 years ago

jjannek commented 9 years ago

Hi Marc,

I am trying to run cfconcurrent on Lucee (Version 4.5.1.008, Express on Jetty) and am having some issues. I am using Java 1.8.0_25 (Oracle Corporation) 64bit on OS X 10.10.3. I saw the references to Java 7 so I was wondering if there is any compatibility issue when using Java 8. I had a quick look at the javadocs and did not find anything problematic at first sight.

Basically the framework runs just fine, but I am seeing errors in the console and the requesttimeout.log. It seems that the threads could not be properly stopped - and am unsure why this is even required at all.

I catch and log all errors within my code (big try/catch within the executor's run method), but there is no error catched.

Sidenote: I had tried out the same configuration but using tomcat instead of jetty. Then the situation gets worse as I was not able to properly shutdown the whole application server. But that's a different case right now. However could be related to the same root cause.

The issue "should" be the same on Railo as Lucee 4.5.1 is a fork and I guess nothing changed in particular this place.

Console: Exception in thread "Thread-22" java.lang.UnsupportedOperationException at java.lang.Thread.stop(Thread.java:869) at lucee.commons.io.StopThread.run(SystemUtil.java:1058)

The "Thread-22" is variable and can contain any number. The rest is always the same.

Looking at the source of the Lucee/Railo Class I can see that lucee calls the thread.stop method and passes in a Throwable argument. According to the java 7 and 8 javadocs the stop method is deprecated as it's unsafe in certain conditions. So the issue "may" be a Lucee/Railo one. However I was wondering why lucee is trying to stop a thread at all?!?

requesttimeout.log "ERROR","Thread-16","04/13/2015","10:34:42","controler","stop thread (4) because run into a timeout (fail to retrieve path:java.lang.NullPointerException:null).;java.lang.Throwable;java.lang.Throwable at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) "

It looks to me as if my code might throw an exception within a thread (timeout). Is there any default timeout? Or is this related to the default request timeout of ColdFusion/Railo/Lucee? And do you know whether the "max concurrent threads" setting in the CF/Railo/Lucee admin has an effect on this one? Is this only related to cfthread instances? Or are plain java threads that are created by cfconcurrent also counted against that setting value? Is a java thread created by JCF/cfconcurrent considered to be a coldfusion request which is kind of strange?

I think cfconcurrent is really a neat framework and I would like to continue working with it. If I can't find out what goes on here, I think I may use plain JCF instead - to get to the root of the problem. But that would be only the second-best solution of course... ;-)

Right now I am really unsure what the source of the errors is:

Any one else having the same issue?

Help is really much appreciated.

Greetings from Germany, Jan

marcesher commented 9 years ago

@jjannek First off, this is a fantastic bug report. Well done.

I agree with your conclusion that the thing now is to isolate the problem, and I have a few starter questions:

To rule out your own code, let's start with the examples that ship with cfconcurrent. Open http://localhost/cfconcurrent, and then click some of the links to run the examples.

If you observe the same behavior as described above, then you've ruled out your own code.

I'd imagine that the most likely culprit is cfconcurrent OR railo/lucee and not the JVM, but let's rule that out first b/c it's easiest.

Assuming you have another JVM to test with, update the lucee config file to point to it, restart, and see if you observe the same behavior.

If you observe the same behavior, you've ruled out the JVM

Now's when it gets tricky.

Before we try ruling out cfconcurrent, I'm curious what you see in the logs when you run the examples.

My first thought when reading your report was just as yours: "why would it be calling stop?"

Let's start with the above, see where you get, and then go from there.

jjannek commented 9 years ago

Hi Marc,

thank you very much vor the quick answer. As you suggested I played with the cfconcurrent examples.

Offtopic: One thing that did not work is the stopping/re-initing of examples. This is because the Application.cfc method onApplicationStop does not exist. It would need to be onApplicationEnd (passing in the application scope reference). See here: https://wikidocs.adobe.com/wiki/display/coldfusionen/onApplicationEnd (same thing for railo/lucee) It's not required to call this method manually as CF calls it automatically during applicationStop(). If calling it after applicationStop() I get errors, because the application scope simply does not exist anymore.

I updated the cfconcurrent code locally but was unable to create a pull request. Maybe this is a permission issue? If you give me permissions to create a pull request I am happy to contribute the changes. It could also be the case that I am doing something wrong as this is the first time that I what to contribute code to an open source project on github.

Another thing which occurred during experimenting with the examples is that I get the following error if canceling all tasks: "key [task1] doesn't exist;key [task1] doesn't exist" (in the exception.log) Is this intended behavior specifically in the examples? I guess this is nothing serious but simply the fact that task1 does not exist in the application scope anymore after stopping the example apps. I did not have a detailed look in the example sources so far, but may check it out later. It might make sense to fix this. In the moment it is a bit strange when running the examples.

Back to the main topic: The Lucee console still reports the UnsupportedOperationException and the requesttimeout.log shows errors as well.

I also added a topic to the Lucee Google Group. See here for reference: https://groups.google.com/forum/#!topic/lucee/sFKrlGPJ1Xc

Micha suggests that it's a JDK/Lucee problem. But I guess that we have to deal with multiple things at once. I also found the part in the Lucee Java source which causes the exception to occur. But still it is not clear why Lucee tries to stop the threads at all. Going back from Java8 to Java7 is a bit odd, as Java7 is nearing EOL. But as Micha suggested, I will give it a try.

Questions are: When does a requesttimeout occur in case of threaded/scheduled task executions using JCF? The default request timeout on Lucee is 50s. I also played around with it and set it to 5min. In my application as well as in your examples there is no single task that takes so long to execute. The only thing that may be is that the problem occurs when the whole scheduling (after n runs) sums up the the timeout? Weird. Does the maxConcurrentThreads setting (defaults to 20 on Lucee) also play a role when using JCF?

Any further thoughts?

jjannek commented 9 years ago

off topic: I managed to get the pull request to work using a fork. I am unsure whether a fork was needed, but that was the only way to create one for now.

jjannek commented 9 years ago

Further experiments: I changed the JVM to JDK 7 which now gives me the following exceptions (those occur frequently when playing with the cfconcurrent examples):

Exception in thread "pool-1-thread-1" java.lang.IllegalMonitorStateException at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:155) at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260) at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:449) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

The errors in the requesttimeout.log are the same as before.

A very basic test to reproduce the problem is the following:

  1. Startup Lucee (I guess it's the same with Railo)
  2. Click the link beneath "Submit a task" in the first cfconcurrent example (Executor Service)
  3. wait a bit (I used a stopwatch and it was approx. the request timeout set in the lucee admin)
  4. have a look at the console and the requesttimeout.log

In this particular case 9 Errors are displayed in the Console and the log file. All other examples result in the same errors. There is no error in the exception.log or application.log displayed. The application seems to execute as expected (all the "good" log entries are written) However it looks as if there are threads left that could not be stopped... Very weird. I have no clue what to do next. My gut feeling is that it's a problem of lucee/railo and how the application deals with threads.

marcesher commented 9 years ago

@jjannek Thanks again for all the details.

Something I'm really curious about: you are trying on Lucee, but can you also try on a recent version of Railo? Back when I developed cfconcurrent -- a long time ago, 2012 -- that would've been Railo 3 era I think. And I remember it worked fine on Railo, no errors like you're describing. So I'd like to see if something changed in the intervening years.

If you can try that out and let me know what you learn, we can try to figure out if there are next steps that we can do, aside from hoping the Lucee people have some ideas.

jjannek commented 9 years ago

I gave it a quick try using Railo Express 4.2.1.008 (using Jetty as well). This one does not show me any errors on the console but also logs the errors in the requesttimeout.log. So I guess it's the same thing but simply not directly showing the errors in the Console. I used cfconcurrent for a while now (as well on Railo before) but did not pay attention to the requesttimeout.log yet. As the console did not show any errors and the app behaved like expected I thought i was fine. However now it looks as I was not.. ;-) It really seems as if there are open threads that Railo/Lucee tries to stop. It has no direct impact on application execution (as nothing goes wrong) but I think there could be a severe indirect impact, e.g. based on memory consumption, too many open threads, etc. that could lead into a leak or performance degradation.

jjannek commented 9 years ago

Just for information: After further experiments I think it's a bug in Lucee/Railo. I created another topic in the Lucee Google Group where I have a simple example app attached to reproduce the problem. This app does not use cfconcurrent and uses as little code as possible.

https://groups.google.com/forum/#!topic/lucee/KPGiH9zF5C8

Let's see where this is heading. Hopefully there is a solution as otherwise cfconcurrent/JCF would be useless in Railo/Lucee.

phal0r commented 7 years ago

FYI: We are running cfconcurrent with fixedRate Tasks on lucee 5.2 and it's working like a charm. Also integration with the coldbox framework was quite easy. Shutdown and restart on app reinit is working fine.

The threads controlled by cfconcurrent show up in java melody, so i can confirm that a new threadpool is created on every reinit of the app.

evagoras commented 5 years ago

@phal0r Have you ported this library to ColdBox/WireBox, and if you have can you share your code? I have a ColdBox app, and I would like to use this project, but it would be nice to have that run in a "proper" ColdBox way.

phal0r commented 5 years ago

@evagoras I am happy to explain you in detail, what I did:

The task is modeled after the documentation in this project. No special hacks necessary. Also, wirebox works out of the box in the tasks. Tasks run in a separate thread managed by the JVM. That means injecting singletons does not work (I am not 100% sure why). To circumveit this one can inject coldbox itself and then request the singletons via coldbox.getWirebox().getInstance("providers.ConfigurationProvider@module-name"); (ConfigurationProvider is a singleton in my module).

All the initialization takes place in the ModuleConfig.cfc of the module.

This works smoothely in Lucee, but craches every Coldfusion version I tested (I think up to Coldfusion 11) with an OutOfMemory-Error after some time running. So, if you run Lucee you should be good to go.

Also, this setup is for Coldbox 3.8. I think it will be similar with newer coldbox versions, some adaptions may be necessary.

Hope this helps you to get started.

evagoras commented 5 years ago

@phal0r thank you for answering so quickly and for offering your code! That looks really good, will need to give that a go. I am on ACF 2016 and ColdBox 5 though, so my stack is a bit different. I will let you know how it goes.

phal0r commented 5 years ago

Yeah, let me know, how it goes.

evagoras commented 5 years ago

@phal0r I am running into some issues trying to implement your design, and I wonder if it's because I have added things in the wrong place. I am probably asking too much here, but is it possible for you to zip up your module (excluding any proprietary code of course) and send it to my gmail account (evagoras@gmail.com)? Once I get this working I will post the entire code on github for anyone else to pick up.