philippta / flyscrape

Flyscrape is a command-line web scraping tool designed for those without advanced programming skills.
https://flyscrape.com
Mozilla Public License 2.0
1.04k stars 31 forks source link

Browser fails to launch on Ubuntu Server 22.04LTS #65

Open dynabler opened 6 months ago

dynabler commented 6 months ago

I fail to get a headless browser to run on a fresh installation of Ubuntu Server 22.04LTS. I have no trouble running the script on WSL.

my script

import urls from "./urls.txt"

export const config = {
  urls: urls.split("\n"),
  browser: true,
  headless: true,
  rate: 30,
  output: {
    file: "twitter_accounts.json",
    format: "json"
  },
  headers: {
  "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36"
  }
};

terminal output

$:~/flyscrape/sc/twitter$ flyscrape run twitter.js

[launcher.Browser]2024/04/13 22:35:08 Download: https://storage.googleapis.com/chromium-browser-snapshots/Linux_x64/1131657/chrome-linux.zip
[launcher.Browser]2024/04/13 22:35:09 Progress: 00%
[launcher.Browser]2024/04/13 22:35:10 Progress: 05%
[launcher.Browser]2024/04/13 22:35:11 Progress: 12%
[launcher.Browser]2024/04/13 22:35:12 Progress: 18%
[launcher.Browser]2024/04/13 22:35:13 Progress: 26%
[launcher.Browser]2024/04/13 22:35:14 Progress: 32%
[launcher.Browser]2024/04/13 22:35:15 Progress: 38%
[launcher.Browser]2024/04/13 22:35:16 Progress: 44%
[launcher.Browser]2024/04/13 22:35:17 Progress: 50%
[launcher.Browser]2024/04/13 22:35:18 Progress: 57%
[launcher.Browser]2024/04/13 22:35:19 Progress: 64%
[launcher.Browser]2024/04/13 22:35:20 Progress: 71%
[launcher.Browser]2024/04/13 22:35:21 Progress: 76%
[launcher.Browser]2024/04/13 22:35:22 Progress: 82%
[launcher.Browser]2024/04/13 22:35:23 Progress: 89%
[launcher.Browser]2024/04/13 22:35:24 Progress: 96%
[launcher.Browser]2024/04/13 22:35:24 Unzip: /home/dynabler/.cache/rod/browser/chromium-1131657
[launcher.Browser]2024/04/13 22:35:24 Progress: 00%
[launcher.Browser]2024/04/13 22:35:25 Progress: 11%
[launcher.Browser]2024/04/13 22:35:26 Progress: 20%
[launcher.Browser]2024/04/13 22:35:27 Progress: 29%
[launcher.Browser]2024/04/13 22:35:28 Progress: 37%
[launcher.Browser]2024/04/13 22:35:29 Progress: 47%
[launcher.Browser]2024/04/13 22:35:30 Progress: 66%
[launcher.Browser]2024/04/13 22:35:31 Progress: 79%
[launcher.Browser]2024/04/13 22:35:32 Progress: 95%
[launcher.Browser]2024/04/13 22:35:33 Downloaded: /home/dynabler/.cache/rod/browser/chromium-1131657
panic: open /tmp/rod/user-data/40574e0be0ee7a88/Default/Preferences: no such file or directory

goroutine 1 [running]:
github.com/go-rod/rod/lib/utils.glob..func2({0xfbc7c0?, 0xc0040b9b00?})
        /root/go/pkg/mod/github.com/go-rod/rod@v0.114.7/lib/utils/utils.go:68 +0x1d
github.com/go-rod/rod/lib/utils.E(...)
        /root/go/pkg/mod/github.com/go-rod/rod@v0.114.7/lib/utils/utils.go:74
github.com/go-rod/rod/lib/launcher.(*Launcher).setupUserPreferences(0xc0000bc1c0?)
        /root/go/pkg/mod/github.com/go-rod/rod@v0.114.7/lib/launcher/launcher.go:482 +0x1c5
github.com/go-rod/rod/lib/launcher.(*Launcher).Launch(0x798352643ff0?)
        /root/go/pkg/mod/github.com/go-rod/rod@v0.114.7/lib/launcher/launcher.go:407 +0x99
github.com/philippta/flyscrape/modules/browser.newBrowser(0x60?)
        /go/src/github.com/philippta/flyscrape/modules/browser/browser.go:71 +0x35
github.com/philippta/flyscrape/modules/browser.(*Module).AdaptTransport(0xc000012600, {0x14bd200?, 0x1b3f2a0?})
        /go/src/github.com/philippta/flyscrape/modules/browser/browser.go:51 +0x45
github.com/philippta/flyscrape.(*Scraper).Run(0xc00007eb40)
        /go/src/github.com/philippta/flyscrape/scrape.go:95 +0x253
github.com/philippta/flyscrape.Run({0x7ffcb23406f8, 0xa}, 0x0?)
        /go/src/github.com/philippta/flyscrape/flyscrape.go:56 +0x37f
github.com/philippta/flyscrape/cmd.(*RunCommand).Run(0x1d294a0, {0xc000036170, 0x1, 0x1})
        /go/src/github.com/philippta/flyscrape/cmd/run.go:32 +0x20a
github.com/philippta/flyscrape/cmd.(*Main).Run(0x200000003?, {0xc000036160?, 0x40bc8b?, 0x409dcb?})
        /go/src/github.com/philippta/flyscrape/cmd/main.go:40 +0x9d
main.main()
        /go/src/github.com/philippta/flyscrape/cmd/flyscrape/main.go:33 +0x5f

I also get this when I execute a bash file, which does the same as flyscrape run twitter.js without all above output:

failed to launch browser: [launcher] Failed to get the debug url: [0413/223040.149405:ERROR:zygote_host_impl_linux.cc(100)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.

Any idea how to solve this for now is much appreciated.