thingsboard / thingsboard.github.io

Documentation repository
Apache License 2.0
125 stars 303 forks source link

Blacklisted Script #190

Open mkeevy opened 5 years ago

mkeevy commented 5 years ago

This is a...

Problem: I have a script rule that reports the following debug information when it fails occasionally. javax.script.ScriptException: java.lang.RuntimeException: java.lang.IllegalStateException: Executor thread not set after 100 ms at delight.nashornsandbox.internal.NashornSandboxImpl$1.invokeFunction(NashornSandboxImpl.java:353) at org.thingsboard.server.service.script.AbstractNashornJsInvokeService.doInvokeFunction(AbstractNashornJsInvokeService.java:91) at org.thingsboard.server.service.script.AbstractJsInvokeService.invokeFunction(AbstractJsInvokeService.java:51) at org.thingsboard.server.service.script.RuleNodeJsScriptEngine.executeScript(RuleNodeJsScriptEngine.java:172) at org.thingsboard.server.service.script.RuleNodeJsScriptEngine.executeUpdate(RuleNodeJsScriptEngine.java:104) at org.thingsboard.rule.engine.transform.TbTransformMsgNode.lambda$transform$0(TbTransformMsgNode.java:52) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Executor thread not set after 100 ms at delight.nashornsandbox.internal.ThreadMonitor.run(ThreadMonitor.java:130) at delight.nashornsandbox.internal.JsEvaluator.runMonitor(JsEvaluator.java:47) at delight.nashornsandbox.internal.NashornSandboxImpl.executeSandboxedOperation(NashornSandboxImpl.java:161) at delight.nashornsandbox.internal.NashornSandboxImpl.access$000(NashornSandboxImpl.java:36) at delight.nashornsandbox.internal.NashornSandboxImpl$1.invokeFunction(NashornSandboxImpl.java:349) ... 11 more Caused by: java.lang.IllegalStateException: Executor thread not set after 100 ms at delight.nashornsandbox.internal.ThreadMonitor.run(ThreadMonitor.java:91) ... 15 more

And later the script fails completely with the error, javax.script.ScriptException: Script is blacklisted due to maximum error count 3! at org.thingsboard.server.service.script.RuleNodeJsScriptEngine.executeScript(RuleNodeJsScriptEngine.java:178) at org.thingsboard.server.service.script.RuleNodeJsScriptEngine.executeUpdate(RuleNodeJsScriptEngine.java:104) at org.thingsboard.rule.engine.transform.TbTransformMsgNode.lambda$transform$0(TbTransformMsgNode.java:52) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

The only solution that I have found to get the script to run again is to restart the Thingsboard server.

I have looked in the logs on the server, but I couldn't see anything.

Proposed Solution:

Page to Update: http://thingsboard.io/...

ThingsBoard Version: V2.2.0 running on Ubuntu 18.04.2.

jmhernandez-circutor commented 5 years ago

In my TB instance it occurs rarely, but when it happens there's no feedback. We cannot know the existence of a blacklisted script until something stops working.

Blacklisted scripts have occurred in integrations, and in several points in rule chains. We cannot flow error messages at each scripted rule node (we've a lot) and there's no way to do that in integrations.

There is a way to know if a script is blacklisted in the moment it happens?

mkeevy commented 5 years ago

@jmhernandez-circutor for me it happens less than 1% of the time. I would be happy to not blacklist the particular rule as it's not super critical. And having to restart the server seems a little harsh.

But it would be good to know when a script is blacklisted. Maybe someone knows how.

gglemke commented 5 years ago

I have alarms set which triggers when device is inactive for a specific period. When I receive such an alarm I investigate why, there could be a number of reasons but sometimes blacklist is the culprit. Main point is we know about it fairly quickly. I agree it would be nice if TB cld email tenant admin when a particular script has been blacklisted.

efeeyuboglu commented 5 years ago

I have the same problem. Something stops working even though I tested every single node. The bad thing is you never know which node got blacklisted because they all seem fine. It would be very nice if there was a way to set alarms/emails when a blacklisting occurs..

Bliph commented 4 years ago

Same issue here. Added logic to send email when scrips fail. This error occurs both for node types “Filter - script” and “Transformation – script”. The error started occuring after we deplayed the alarm system on our system. Maybe this caused a lot of extra load, and it also creates errors in the log around the sime time the scripts fail.

Logic to send email on script fail: [script node] ->(Failure link)->[Action - create alarm]->(Created link)-> [Transformatiin - to email] ->(Success link)->[External - send email]

Please see issue https://github.com/thingsboard/thingsboard/issues/1274 for details on why scripts may fail.

There is also a good sumup in https://github.com/thingsboard/thingsboard/issues/2085

Do any of you have answers to following questions?

pineful commented 4 years ago

I am also experiencing blacklist problem on our thingsboard. I cannot catch what is the root cause of blacklisted javascript source code. I don't understand why it is not logged to file at all. I suggest that it need some patch on 2.4.x version to log what was happened to be blacklisted. Because it needs much time to apply v3.0 next year, we are experiencing now very seriously. I am going to stop javascript vm sandbox (I really didn't want)

hallard commented 4 years ago

Blacklist notification is a must and the way the counter is handled should be reviewed. I mean if 1% of my entries fails (because I do not always have hand on what I receive) it can happen but if after that, if next data received is correct then reset the counter, I can live with 1% data having issue (bad format, missing field, ...) but this 1% should not break the other 99% on the rule chain. Blacklist should occurs on 3 (or more) rules chain script CONSECUTIVES errors, as soon as after an error, next pass is good, reset the error counter.

Anyway it's not the 1st time that this break and stop our rule chain, and device sends messages, so I can't check that on "no device communication".

So please give us a notification, log, whatever easy so we can get rid of this. it's not the 1st time my customer call me because it's not working anymore.

Nathan-ma commented 2 years ago

Same issue here. Added logic to send email when scrips fail. This error occurs both for node types “Filter - script” and “Transformation – script”. The error started occuring after we deplayed the alarm system on our system. Maybe this caused a lot of extra load, and it also creates errors in the log around the sime time the scripts fail.

Logic to send email on script fail: [script node] ->(Failure link)->[Action - create alarm]->(Created link)-> [Transformatiin - to email] ->(Success link)->[External - send email]

Please see issue thingsboard/thingsboard#1274 for details on why scripts may fail.

There is also a good sumup in thingsboard/thingsboard#2085

Do any of you have answers to following questions?

  • How can I find the "root" error causing the script to fail? the error seems to be "lost" after 3 retries, and I cannot find any java-error in the log
  • Is there a way to extract the error text inside the rule node and add it to metadata (or something like that)?

@Bliph, Great minds think alike I guess. I'm currently facing the same issue, a few scripts are just failing and I have a bunch of them so it's hard to track them all. My solution was an alarm and log node for when they fail as well and I was also researching all the community places I could, hoping to find an answer on :

So I could include the error reason on the alarms.

Perhaps you already found a solution to this. care to share?