pipes-digital / pipes

Repository for Pipes
https://pipes.digital
GNU Affero General Public License v3.0
248 stars 19 forks source link

'Execute javascript' is not working for me #95

Closed benfishbus closed 2 years ago

benfishbus commented 2 years ago

I am running Pipes CE in a docker container. Everything in the bundle is installed, but the 'Execute javascript' checkbox in the Download block doesn't work.

First this exception is logged:

2022-05-08 22:54:26 - Selenium::WebDriver::Error::WebDriverError - Unable to find chromedriver. Please download the server from
https://chromedriver.storage.googleapis.com/index.html and place it somewhere on your PATH.
More info at https://github.com/SeleniumHQ/selenium/wiki/ChromeDriver.

I went and got chromedriver as instructed, and randomly chose to put it in /usr/local/sbin.

Then this exception is logged:

2022-05-09 08:55:57 - Selenium::WebDriver::Error::UnknownError - unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)

What am I missing? Is there an assumption that selenium has already been set up in some manner?

onli commented 2 years ago

The error message seems to cover many potential errors, from version mismatches to incorrect permissions.

As far as I understand, you need to have chrome/chromium installed on your system and then download the webdriver for that exact version. This is error prone. But then https://github.com/teamcapybara/capybara#setup should work.

The relevant code by pipes is in the downloader.rb, it just calls capybara to download the target page.

Edit: Under Ubuntu you'd install the packages chromium-browser and chromium-chromedriver. Even if that installs slow snaps it should work right away. You'd probably need to delete your manually placed chromedriver.

benfishbus commented 2 years ago

That helped. My container is based on the official Ruby container, which runs Debian, so the packages I needed to install were named chromium and chromium-driver. Then, based on a stackoverflow question, I went into /usr/local/bundle/gems/capybara-3.36.0/lib/capybara/registrations/drivers.rb and edited the registration block for :selenium_chrome_headless to include opts.add_argument('--no-sandbox'). (There's probably a better way to accomplish this in code.)

Now, Pipes loads the page when Execute javascript is checked, but the output is not substantially different from when it is not checked. Sites still say "You need to enable JavaScript", "Please enable JavaScript to view this page properly", etc. However, opening a shell in the container and manually using chromium-browser to dump the DOM also produces this result. So, I don't think is the result of anything Pipes is doing.

I did find a headless Chrome container zenika/alpine-chrome that is able to load one of my test sites https://405d.hhs.gov/resources, but when I use that container as a base for Pipes, Chromium breaks and begins producing the same "no javascript" result as my current setup.

Given this is not really a Pipes problem, I am closing this issue.

onli commented 2 years ago

Maybe you can switch over to Ubuntu LTS? I was really surprised how effortlessly that worked there.

However, opening a shell in the container and manually using chromium-browser to dump the DOM also produces this result. So, I don't think is the result of anything Pipes is doing.

Be careful how you do that. That it's not by saving the page, but by inspecting the current JS-generated DOM with the dev tools. The latter would also be what Pipes is supposed to see.

I hope you find a working solution, comment if I can be of further help.

benfishbus commented 2 years ago

Ubuntu chromium-browser requires snap, which does not work under Docker, so I've spun up a VM (sigh) and set up Pipes there. Portier login doesn't work, though. I get the code in my email, enter it, and get the following exception:

OpenSSL::PKey::PKeyError at /_browserid_assert
rsa#set_key= is incompatible with OpenSSL 3.0

    file: browserid.rb location: set_key line: 51 

Full details, including backtrace, attached: OpenSSL PKey PKeyError at __browserid_assert.pdf

I did manage to build a pipes container based on zenika/alpine-chrome, but I've suspended those efforts to try out your Ubuntu suggestion.

onli commented 2 years ago

Looks like https://bugs.launchpad.net/ubuntu/+source/ruby-openid-connect/+bug/1965184 bit you there. My test case was with 20.04 LTS - I don't have a better solution at hand than to propose using that as your VM base. https://github.com/nov/json-jwt/pull/101 is not merged, I can't update the portier gem.

benfishbus commented 2 years ago

My mistake for thinking I could go with 22.04 LTS instead of 20.04...

onli commented 2 years ago

Na, please don't be demotivated. This is a really unlucky bug to run into and 22.04 LTS was of course the logical choice.

benfishbus commented 2 years ago

Wow, that's amazing, 20.04 just works. Nothing ever "just works" :-) Maybe someday I'll figure out how to get it done in Docker, but right now just grateful to have it working! Thanks for the suggestion.