lambdaisland / kaocha-cljs

ClojureScript support for Kaocha
Eclipse Public License 1.0
40 stars 10 forks source link

Occasional "Kaocha ClojureScript client failed connecting back" error #14

Closed kommen closed 4 years ago

kommen commented 5 years ago

Our kaocha-cljs testsuite some times errors with "Kaocha ClojureScript client failed connecting back", especially in our CI environment.

This originates from here: https://github.com/lambdaisland/kaocha-cljs/blob/e167b2efb1a0b879a55840826b24c8e7bd74160e/src/kaocha/type/cljs.clj#L245

Debugging this with @plexus and setting :bindings {kaocha.type.cljs/*debug* true} together with :capture-output? false in our tests config, showed that it is failing already at the very beginning of setting up the websocket connection, as this is the only output produced:

EVAL:  (require (quote kaocha.cljs.websocket-client)) (require-macros (quote kaocha.cljs.run)) :require-websocket-client-done159719
EVAL:  :cljs/quit

We need to gather more information, probably with more debug logging, but we suspect it could be related to the implementation of WriteableReader.

plexus commented 5 years ago

Looking at this I think the right way forward is to drop WriteableReader, and instead fork cljs.server/prepl to take a queue which delivers forms to be executed instead of a reader.

mk commented 5 years ago

Maybe https://github.com/bhauman/figwheel-repl is also something worth looking at. I remember Bruce talking about (IIRC The REPL Episode 4) the stability of the built-in repl not being up to his standards which lead him to create figwheel-repl.

countgizmo commented 5 years ago

@kommen I had a similar issue - was getting the same exception and the same output as you. What helped me get through this one was increasing the :cljs/timeout value. I think default is 10 seconds and it wasn't enough for my project. I'm currently using 60 seconds (just in case).

countgizmo commented 5 years ago

@plexus Unfortunately I'm seeing the same behaviour in our test suite.

Warning: speculations based on not knowing kaocha-cljs' source code below!

I was wondering if it might be related to the fact that Chrome, for example, will close idle WS connections (I've googled around and found timeouts mentions from 10 seconds to 1 minute). Is it possible that kaocha-cljs is taking too long to run analyses/compilation, so by the time the first WS message is sent, the browser has already dropped the connection as idle?

I've had to deal with this in our project by creating a tiny keep-alive loop with {:type :ping} messages. I've tried to do the same on my local fork of kaocha-cljs but it only made things worse - kaocha will never get the :timeout event in case of a hanging test (I was lucky enough to have such test handy). But maybe the keepalive pings should be only sent during some "loading" phase in kaoach-cljs.

gmp26 commented 5 years ago

I'm thinking that a Chrome update may have exacerbated this problem recently because it has appeared in an unchanged project that was working perfectly last month. Also, the problem disappears if testing against Safari or Firefox. Chrome now has some new 'workbox' gubbins in its error console which may be interfering somehow:

...
workbox Precaching is responding to: /static/d/164/path---404-html-516-62a-NZuapzHg3X9TaN1iIixfv1W23E.json
workbox-core.dev.js:132 workbox Precaching is responding to: /
?rel=1565125622272:1 Uncaught SyntaxError: Unexpected token < in JSON at position 0
    at Object.parse (<anonymous>)
    at Object.callback (repl.js:378)
    at goog.net.xpc.CrossPageChannel.goog.messaging.AbstractChannel.deliver (abstractchannel.js:141)
    at goog.net.xpc.CrossPageChannel.xpcDeliver (crosspagechannel.js:734)
    at Function.goog.net.xpc.NativeMessagingTransport.messageReceived_ (nativemessagingtransport.js:321)
    at Object.goog.events.fireListener (events.js:744)
    at goog.events.handleBrowserEvent_ (events.js:870)
    at f (events.js:289)
workbox-core.dev.js:132 workbox Router is responding to: https://fonts.googleapis.com/css?family=Open+Sans
workbox-core.dev.js:132 workbox Using NetworkFirst to respond to 'https://fonts.googleapis.com/css?family=Open+Sans'
gmp26 commented 5 years ago

Although that JSON error looks like somebody is responding with HTML instead of JSON.

And, just to follow up on that, disabling the site service worker code clears the workbox error and kaocha-cljs then connects correctly in Chrome. So for me kaocha-cljs misconnect is a symptom of a service worker issue.

kommen commented 5 years ago

FYI, since we've upgraded kaocha-cljs (which includes https://github.com/lambdaisland/kaocha-cljs/commit/e04110cb1e07c3529f717b15666888d2fe2c4d56#diff-7a555df1169203c27606112dabd4d6ea) we haven't run into that problem for a while.

plexus commented 5 years ago

There's also some stuff that's still unreleased, meant to make kaocha-cljs compatible with figwheel-repl, but it should also improve things in general. I'll be on holidays the next few weeks, but I do plan to put out a new release after that. It still requires some cleanup first.

gmp26 commented 5 years ago

We are consistently seeing this error when testing under ubuntu when using cljs.repl.browser/repl-env and launching the tests with:

xvfb-run kaocha :unit-cljs

The default browser (firefox) does never runs so perhaps it's not surprising. But maybe there is a better way to run browser tests in linux?

ikitommi commented 5 years ago

starting to get these too, only on ci: https://circleci.com/gh/metosin/malli/73

gmp26 commented 5 years ago

We briefly avoided the issue by swapping to the node repl, but it came back to bite us as soon as we included a test that pulled in a namespace that referred to js/localStorage. This causes the cljs compilation to fail with

"Execution error (ReferenceError) at (<cljs repl>:1).\nlocalStorage is not defined\n".

So far I haven't found a way to avoid this error in kaocha while retaining the code for the browser runtime.

plexus commented 4 years ago

@countgizmo @gmp26 @ikitommi could you try running master and report back? There are several improvements which may help to make this issue go away.

kommen commented 4 years ago

@plexus as commented before and we talked in person yesterday, kaocha-cljs 0.0-40 has been remarkably stable regarding to this issue for us. I have updated to b5afd11f80b71ac95cdd8e4806fe2f0b807f16d2 now and haven't noticed any regressions so far. I'll report back after gathering a bit more data.

gmp26 commented 4 years ago

Master still failing in linux/firefox for us. Browser repl works for MacOSX chrome, but using master kaocha and kaocha-cljs we still can't get a linux system loading the default browser for CI to work. We're using xvfb and default firefox browser there since we can't find a way to configure headless. See log at https://gist.github.com/gmp26/ad48cdf89097dbe0ee046aafa28e3b68.

Maybe we're not driving it correctly as we don't yet know how to select specify headless browser, or how to launch a specific browser.

plexus commented 4 years ago

@gmp26 can you add :kaocha/binding {:kaocha.type.cljs/*debug* true} to your tests.edn and capture the log?

Could you also share your tests.edn, as well as any scripts you're using to set up the environment and launch it?

Thanks!

jin-park-dev commented 4 years ago

Hi, I work with @gmp26

Output of running test with xvfb. Looks like issue with launching browser. https://gist.github.com/k-oneGene/6c82d9531cc1087ae33db6dca0691ee1

To run we're doing lein deps npm install xvfb-run ./bin/kaocha :unit-cljs --no-capture-output

gmp26 commented 4 years ago

We're guessing it might be an issue with missing java.awt.Desktop support on Ubuntu. Maybe we're missing a lib there or using the wrong java (we're on java version "1.8.0_152")?

tests.edn is


#kaocha/v1
    {:tests [
              {:id            :unit-cljs
              :type          :kaocha.type/cljs
              :source-paths  ["src/cljs"]
              :test-paths    ["test/cljs"]
              :cljs/timeout 15000
              :cljs/repl-env cljs.repl.browser/repl-env
              }]
     :bindings {kaocha.type.cljs/*debug* true}
     }
plexus commented 4 years ago

I did a bit of digging @gmp26 / @k-oneGene. The error you're getting

Failed to launch a browser:
 The BROWSE action is not supported on the current platform!

Originates indeed from java.awt.Desktop

https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/awt/Desktop.java#L192-L194

The actual implementations of this stuff however depend on the java.awt.Toolkit, which can be influenced by setting the awt.toolkit property.

https://github.com/srisatish/openjdk/blob/master/jdk/src/share/classes/java/awt/Toolkit.java#L847-L849

What I would do is see what the output is of:

(class (java.awt.Toolkit/getDefaultToolkit))

It seems the main option on Linux is sun.awt.X11.XToolkit, but I'm guessing for you it will be something else.

https://github.com/JetBrains/jdk8u_jdk/blob/master/src/solaris/classes/sun/awt/X11/XToolkit.java

You should be able to force it with JAVA_OPTS=-Dawt.toolkit=sun.awt.X11.XToolkit, in that case it will eventually bottom out in a native call to gnome_url_open. Not sure where it goes from there, but there's gnome-www-browser, I'm guessing it uses that, so you could symlink that to whatever you want it to be (update-alternatives --config gnome-www-browser updates said symlink).

https://github.com/JetBrains/jdk8u_jdk/blob/master/src/solaris/classes/sun/awt/X11/XDesktopPeer.java#L106-L128

plexus commented 4 years ago

I'm going to close this issue, as there are so many things that could be causing this, most of them user errors. General tips for people coming here:

gmp26 commented 4 years ago

@plexus. Many thanks, that's all very helpful.