sensepost / gowitness

๐Ÿ” gowitness - a golang, web screenshot utility using Chrome Headless
GNU General Public License v3.0
3.16k stars 344 forks source link

GoWitness v3.0.3 Missing Output from Writers in CI/CD #237

Closed mr-pmillz closed 2 weeks ago

mr-pmillz commented 3 weeks ago

Describe the bug

In a CI/CD pipeline, GoWitness v3.0.3 runs, but the screenshots and output from sqlite db / jsonl writers doesn't get written. Also there is no stdout.

I just noticed the --write-stdout flag which says: "Write successful results to stdout (usefull in a shell pipeline" Is this flag required for the other writer's output to get written in addition to stdout?

Would using go-rod instead of chrome driver in a pipeline be better in your opinion, or is this issue more related to a bug than a particular web driver?

To Reproduce

Steps to reproduce the behavior: Run the following commands in a GitLab, GitHub, etc. CICD pipeline:

gowitness scan file -f urls.txt \
--screenshot-path screenshots \
--write-db \
--write-db-uri 'sqlite://gowitness.sqlite3' \
--write-jsonl \
--write-jsonl-file results.jsonl \
-t 20 \
--chrome-user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0'

gowitness report generate --db-uri 'sqlite://gowitness.sqlite3' \
--zip-name report.zip \
--screenshot-path screenshots

Expected behavior

I expected the screenshots directory to contain the resulting screenshots, the sqlite db to be populated, and the results.jsonl file to not be empty.

Version Information:

Additional context

When I run these commands in a regular terminal, it works as expected. But the same commands in a CICD pipeline does not write out the results to the screenshots, sqlite DB, or results.jsonl file.

leonjza commented 3 weeks ago

I just noticed the --write-stdout flag which says: "Write successful results to stdout (usefull in a shell pipeline" Is this flag required for the other writer's output to get written in addition to stdout?

No. Writers don't depend on each other. They are all "in addition to". --write-stdout simply writes the URL that was successful to stdout. It's use with other writers should not influence them in any way.

Would using go-rod instead of chrome driver in a pipeline be better in your opinion, or is this issue more related to a bug than a particular web driver?

No. Drivers populate a result that is then passed off to writers, regardless of which driver generated them.

gowitness scan file -f urls.txt \
--screenshot-path screenshots \
--write-db \
--write-db-uri 'sqlite://gowitness.sqlite3' \
--write-jsonl \
--write-jsonl-file results.jsonl \
-t 20 \
--chrome-user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0'

gowitness report generate --db-uri 'sqlite://gowitness.sqlite3' \
--zip-name report.zip \
--screenshot-path screenshots

I can't think of a reason now why the behavior would be different on say a GitHub actions runner vs elsewhere, but to test if there is some pathing related weirdness happening, can you try use more specific paths? Eg --screenshot-path ./screenshots, --write-db-uri sqlite:///gowitness.sqlite3 and --write-jsonl-file ./results.jsonl. Fwiw, all of the path related values you have specified are also the defaults (see --help on any of the scan commands), so you only really need to enable the writers with --write-db --write-jsonl.

mr-pmillz commented 3 weeks ago

for brevity, I simplified the output paths in my example for this issue, I was using absolute paths like:

gowitness scan file -f /output/httpx/httpx-responsive-in-scope-urls.txt \
--screenshot-path /output/gowitness/screenshots \
--write-db \
--write-db-uri 'sqlite:///output/gowitness/gowitness.sqlite3' \
--write-jsonl \
--write-jsonl-file /output/gowitness/results.jsonl \
-t 20 \
--chrome-user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0'

gowitness report generate --db-uri 'sqlite:///output/gowitness/gowitness.sqlite3' \
--zip-name /output/gowitness/report.zip \
--screenshot-path /output/gowitness/screenshots

these commands work in regular terminal but in a gitlab CI/CD pipeline running in a latest kali docker image the output files are empty even though it took approx ~ 5 minutes to run. image

image

Are there any other dependencies that might be required for the kali docker image? I have chromium installed in the docker image ๐Ÿค”

I will try it with the --write-stdout option although I don't think using --write-stdout will fix the issue. Any who, thank you for taking a look. I'll let you know if I figure out what the problem is/was.

leonjza commented 3 weeks ago

Appreciate the feedback. To check, are you using the official Kali image? I can take a look at it in that scenario to also maybe narrow down what is happening.

mr-pmillz commented 3 weeks ago

No prob, yeah, that'd be much appreciated!

Here's the Dockerfile I'm using to build the image:

FROM kalilinux/kali-rolling:latest

ENV DEBIAN_FRONTEND noninteractive
ENV TERM xterm-256color
RUN apt-get update
RUN apt-get update && apt-get install -y -q apt-utils
RUN apt-get install -y -q \
    curl dnsutils ca-certificates python3 virtualenv \
    python3-distutils-extra python3-virtualenv python3-pip \
    python3-setuptools python3-wheel python3-magic python3-venv \
    pipx git sudo whois zip unzip libimage-exiftool-perl recon-ng \
    golang chromium tini sqlite3 build-essential libpcap-dev wget \
    && apt-get autoremove -y \
    && apt-get autoclean -y \
    && rm -rf /var/lib/apt/lists/*

# install go v1.23.1 since golang apt package is at v1.22.6 as of 9-10-2024
RUN wget https://go.dev/dl/go1.23.1.linux-amd64.tar.gz -O /tmp/go1.23.1.linux-amd64.tar.gz
RUN rm -rf /usr/local/go && tar -C /usr/local -xzf /tmp/go1.23.1.linux-amd64.tar.gz
RUN rm /tmp/go1.23.1.linux-amd64.tar.gz

# Install Tini
ENV TINI_VERSION v0.19.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini

WORKDIR /app
COPY . /app
COPY entrypoint.sh deploy.sh /
RUN chmod +x /entrypoint.sh /deploy.sh

RUN if [ ! -d "${HOME}/go" ]; then mkdir "${HOME}/go"; fi

ENV GO111MODULE=on
RUN go mod download
RUN GOOS=linux GOARCH=amd64 go build -v -trimpath -ldflags="-s -w" -o /usr/bin/redacted .

RUN chmod +x /entrypoint.sh /deploy.sh /usr/bin/redacted
RUN mkdir -p "${HOME}/.config/redacted"
RUN cd "$HOME"
RUN rm -rf /app

ENTRYPOINT ["/tini", "--", "/entrypoint.sh"]
mr-pmillz commented 3 weeks ago

I wonder if it's due to installing sqlite via apt since using the self-contained github.com/glebarez/sqlite v1.11.0 module ๐Ÿค” I'm also using this github.com/glebarez/sqlite module in my go binary,

leonjza commented 3 weeks ago

Thanks! I don't think the apt install of sqlite will matter for the pure Go implementation no. Last one, how do you install/use gowitness in this container?

mr-pmillz commented 3 weeks ago

Am running it via this function:

//// gowitness and subfinder have a dependency collision: https://github.com/projectdiscovery/subfinder/issues/1374
//// # github.com/projectdiscovery/utils/update
//// ../../../go/pkg/mod/github.com/projectdiscovery/utils@v0.2.9/update/update.go:97:40: undefined: glamour.ASCIIStyleConfig
//// My PR https://github.com/projectdiscovery/utils/pull/531 for projectdiscovery/utils got merged in, awaiting new release
//// httpx also has older utils indirect dep v0.2.4
// runGoWitness installs the latest version and runs goWitness
func runGoWitness(urlsFile, outputDir, userAgent string) error {
    installedAptPackages, err := localio.NewAptInstalled()
    if err != nil {
        return err
    }
    // check if chromium installed and install it via apt-get if not installed.
    if err = localio.AptInstall(installedAptPackages, "chromium"); err != nil {
        return err
    }

    // TODO: GoWitness v3.X is suitable to be run natively in Go which will be more optimal. problem is there is currently a dependency collision as of 09-20-2024
    localio.InfoLabelWithColorf("GoWitness", "blue", "Installing latest version of GoWitness")
    if err = localio.RunCommandPipeOutput("GO111MODULE=on go install github.com/sensepost/gowitness@master", nil, false, false, 20); err != nil {
        return err
    }
    if gowitness, exists := localio.CommandExists("gowitness"); exists {
        goWitnessScreenshotDir := fmt.Sprintf("%s/gowitness/screenshots", outputDir)
        goWitnessJSONLOutputFile := fmt.Sprintf("%s/gowitness/results.jsonl", outputDir)
        goWitnessDB := fmt.Sprintf("sqlite://%s/gowitness/gowitness.sqlite3", outputDir) // must specify sqlite:/// for absolute database path. New in GoWitness 2.5.0 release https://github.com/sensepost/gowitness/releases/tag/2.5.0
        goWitnessReport := fmt.Sprintf("%s/gowitness/report.zip", outputDir)             // exported report file is a zip that contains all the screenshots and report.html file.

        // GoWitness automatically creates the output dir, but just in case later versions change, we create the dir.
        if err = os.MkdirAll(goWitnessScreenshotDir, 0750); err != nil {
            return err
        }

        if err = localio.RunCommandPipeOutput(fmt.Sprintf("%s scan file -f %s --screenshot-path %s --write-db --write-db-uri '%s' --write-jsonl --write-jsonl-file %s -t 20 --chrome-user-agent '%s'", gowitness, urlsFile, goWitnessScreenshotDir, goWitnessDB, goWitnessJSONLOutputFile, userAgent), nil, true, true, 120); err != nil {
            return err
        }

        if err = localio.RunCommandPipeOutput(fmt.Sprintf("%s report generate --db-uri '%s' --zip-name %s --screenshot-path %s", gowitness, goWitnessDB, goWitnessReport, goWitnessScreenshotDir), nil, true, true, 60); err != nil {
            return err
        }

        localio.PrintInfo("Gowitness", fmt.Sprintf("%s report server --db-uri '%s' --screenshot-path %s", gowitness, goWitnessDB, goWitnessScreenshotDir), "To view gowitness report, unzip report.zip and open report.html in the browser or, run the gowitness server, via the following command:")
    }
    return nil
}

as I'm looking at this code, I see I'm still using master instead of latest for install. not that it should matter too much, is a remnant of previous 2.5.X release. My plan is to run GoWitness natively in go but am awaiting resolution of deps collision in go.mod with subfinder & httpx.

Some helper funcs in there like localio.RunCommandPipeOutput which is just a wrapper around os.exec

leonjza commented 3 weeks ago

Doing some testing, I'm not 100% sure I can replicate this issue. An extra thing I'm adding to test is to not create the results directory before hand in case that causes some issue, but even with that case I'm not replicating the issue yet.

A minimal Dockerfile im using:

FROM kalilinux/kali-rolling:latest

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
    apt-get install \
    ca-certificates golang chromium sqlite3 jq vim \
    -y --no-install-recommends

RUN go install github.com/sensepost/gowitness@latest

CMD ["bash"]

An invocation

โฏ docker run --rm -it issue237
โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€# ls -lah /tmp/
total 0
drwxrwxrwt 1 root root 0 Sep 24 16:39 .
drwxr-xr-x 1 root root 0 Sep 24 16:51 ..

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€# ~/go/bin/gowitness scan single -u https://sensepost.com --write-db --write-db-uri sqlite:///tmp/results/gowitness.sqlite3 --write-jsonl --write-jsonl-file /tmp/results/gowitness.jsonl --screenshot-path /tmp/results/screenshots
2024/09/24 16:51:59 INFO result ๐Ÿค– target=https://sensepost.com status-code=200 title=":: Orange Cyberdefense ::" have-screenshot=true

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€# ls -lah /tmp/results/
total 144K
drwxr-xr-x 1 root root   86 Sep 24 16:51 .
drwxrwxrwt 1 root root   72 Sep 24 16:51 ..
-rw-r--r-- 1 root root  32K Sep 24 16:51 gowitness.jsonl
-rw-r--r-- 1 root root 104K Sep 24 16:51 gowitness.sqlite3
drwxr-xr-x 1 root root   46 Sep 24 16:51 screenshots

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€# sqlite3 /tmp/results/gowitness.sqlite3 "select url, probed_at, response_code from results;"
https://sensepost.com|2024-09-24 16:51:54.025305276+00:00|200

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€# cat /tmp/results/gowitness.jsonl | jq -r "[.url,.response_code]|@csv"
"https://sensepost.com",200

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€#

Using scan file should have no effect, but to test I added two targets.

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€# vim targets.txt

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€# ~/go/bin/gowitness scan file -f /targets.txt --write-db --write-db-uri sqlite:///tmp/results-file/gowitness.sqlite3 --write-jsonl --write-jsonl-file /tmp/results-file/gowitness.jsonl --screenshot-path /tmp/results-file/screenshots
2024/09/24 16:55:30 INFO result ๐Ÿค– target=https://google.com:443 status-code=200 title=Google have-screenshot=true
2024/09/24 16:55:30 INFO result ๐Ÿค– target=https://sensepost.com:443 status-code=200 title=":: Orange Cyberdefense ::" have-screenshot=true

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€# sqlite3 /tmp/results-file/gowitness.sqlite3 "select url, probed_at, response_code from results;"
https://google.com:443|2024-09-24 16:55:25.323197622+00:00|200
https://sensepost.com:443|2024-09-24 16:55:25.332050825+00:00|200

โ”Œโ”€โ”€(rootใ‰ฟ4c10072ae922)-[/]
โ””โ”€#

Since you're shelling out from a go program, maybe add -D to get some debug logging. It will tell you where its going to write stuff. Logging will write to stderr, fwiw.

~/go/bin/gowitness -D scan file -f /targets.txt --write-db --write-db-uri sqlite:///tmp/results-debug-file/gowitness.sqlite3 --write-jsonl --write-jsonl-file /tmp/results-debug-file/gowitness.jsonl --screenshot-path /tmp/results-debug-file/screenshots
2024/09/24 16:58:15 DEBU <cmd/root.go:28> debug logging enabled
2024/09/24 16:58:15 DEBU <cmd/scan.go:72> scanning driver started driver=chromedp
2024/09/24 16:58:15 DEBU <runner/runner.go:42> final screenshot path screenshot-path=/tmp/results-debug-file/screenshots
2024/09/24 16:58:15 DEBU <cmd/scan_file.go:57> starting file scanning file=/targets.txt
2024/09/24 16:58:15 DEBU <drivers/chromedp.go:122> witnessing ๐Ÿ‘€ target=https://sensepost.com:80
2024/09/24 16:58:15 DEBU <drivers/chromedp.go:122> witnessing ๐Ÿ‘€ target=https://google.com:80
2024/09/24 16:58:15 DEBU <drivers/chromedp.go:122> witnessing ๐Ÿ‘€ target=https://sensepost.com:443
2024/09/24 16:58:15 DEBU <drivers/chromedp.go:122> witnessing ๐Ÿ‘€ target=https://google.com:443
2024/09/24 16:58:20 INFO <runner/runner.go:146> result ๐Ÿค– target=https://google.com:443 status-code=200 title=Google have-screenshot=true
2024/09/24 16:58:21 INFO <runner/runner.go:146> result ๐Ÿค– target=https://sensepost.com:443 status-code=200 title=":: Orange Cyberdefense ::" have-screenshot=true
2024/09/24 16:58:21 DEBU <drivers/chromedp.go:462> closing browser allocation context

Now I know you logged the issue about CI/CD setups specifically, but I wanted to check if thats still what we think the issue is here?

Will keep investigating.

leonjza commented 3 weeks ago

On the CI/CD train, I created a test GitHub action to run gowitness and that seems to also output results. https://github.com/leonjza/gowitness-cicd-example/actions/runs/11018535188/job/30599032224

And some more complete testing:

https://github.com/leonjza/gowitness-cicd-example/actions/runs/11018658036/job/30599435634

mr-pmillz commented 3 weeks ago

Thank you for testing this!!! I'm going to make a couple modifications and see if that resolves it in GitLab.

  1. change to latest instead of master when installing via go install
  2. add the --write-stdout flag
  3. based on this code right here, i don't think that creating the dir ahead of time is the issue: https://github.com/sensepost/gowitness/blob/737f59065bd402f92d3e5eec96b39b84b9558d51/pkg/runner/runner.go#L37-L38
  4. going to add the debug -D flag to see if something else is causing the issue.

Much appreciated. I will follow up later sometime this week with my findings if I am able to figure out what the culprit was. ๐Ÿ’ฏ

mr-pmillz commented 3 weeks ago

So i also did some testing using my Dockerfile in local environment and it looks like the main culprit was using 20 threads. Docker doesn't like it when 20 threads are used it seems. Even the default of 6 threads seemed to cause issues inside of Docker. Got lots of these errors:

err="error enabling network tracking: context deadline exceeded"

3 threads seems to be the sweet spot from testing in Docker. It worked as expected with 3 threads. Will let ya know if this fixes it in the CI/CD as well next time a pipeline runs so I can review.

๐Ÿป

leonjza commented 3 weeks ago

Thanks for the feedback. I think there may be something more subtle at play here. Using this Dockerfile

FROM kalilinux/kali-rolling:latest

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
    apt-get install \
    ca-certificates golang chromium sqlite3 jq vim \
    -y --no-install-recommends

ADD top1mlessporn100.txt /top1mlessporn100.txt

RUN go install github.com/sensepost/gowitness@latest

CMD ["bash"]

I can run through a list of ~100 targets using 20 goroutines and a mediocre internet connection in about 15 minutes, with results:

# wc -l /top1mlessporn100.txt
100 /top1mlessporn100.txt
# time ~/go/bin/gowitness scan file -f /top1mlessporn100.txt -t 20 --write-db --write-jsonl
2024/09/25 08:27:54 INFO result ๐Ÿค– target=http://twitter.com:80 status-code=200 title="X. Itโ€™s whatโ€™s happening / X" have-screenshot=true
2024/09/25 08:28:08 INFO result ๐Ÿค– target=http://twitter.com:443 status-code=200 title="X. Itโ€™s whatโ€™s happening / X" have-screenshot=true
2024/09/25 08:28:19 INFO result ๐Ÿค– target=https://twitter.com:443 status-code=200 title="X. Itโ€™s whatโ€™s happening / X" have-screenshot=true
2024/09/25 08:28:24 INFO result ๐Ÿค– target=http://cloudflare.com:80 status-code=200 title="Connect, Protect and Build Everywhere | Cloudflare" have-screenshot=true
2024/09/25 08:28:30 INFO result ๐Ÿค– target=http://cloudflare.com:443 status-code=200 title="Connect, Protect and Build Everywhere | Cloudflare" have-screenshot=true
2024/09/25 08:28:37 INFO result ๐Ÿค– target=https://cloudflare.com:443 status-code=200 title="Connect, Protect and Build Everywhere | Cloudflare" have-screenshot=true
2024/09/25 08:28:44 INFO result ๐Ÿค– target=http://instagram.com:80 status-code=200 title=Instagram have-screenshot=true
2024/09/25 08:28:53 INFO result ๐Ÿค– target=http://instagram.com:443 status-code=200 title=Instagram have-screenshot=true
2024/09/25 08:28:59 INFO result ๐Ÿค– target=http://linkedin.com:443 status-code=200 title="LinkedIn: Log In or Sign Up" have-screenshot=true
2024/09/25 08:29:07 INFO result ๐Ÿค– target=https://linkedin.com:443 status-code=200 title="LinkedIn: Log In or Sign Up" have-screenshot=true
2024/09/25 08:29:23 INFO result ๐Ÿค– target=http://live.com:80 status-code=200 title="Microsoft Outlook (formerly Hotmail): Free email and calendar | Microsoft 365" have-screenshot=true
2024/09/25 08:29:43 INFO result ๐Ÿค– target=http://live.com:443 status-code=200 title="Microsoft Outlook (formerly Hotmail): Free email and calendar | Microsoft 365" have-screenshot=true
2024/09/25 08:30:01 INFO result ๐Ÿค– target=https://live.com:443 status-code=200 title="Microsoft Outlook (formerly Hotmail): Free email and calendar | Microsoft 365" have-screenshot=true
2024/09/25 08:30:05 INFO result ๐Ÿค– target=https://googletagmanager.com:443 status-code=404 title="Error 404 (Not Found)!!1" have-screenshot=true
2024/09/25 08:30:15 INFO result ๐Ÿค– target=http://fbcdn.net:80 status-code=200 title="Facebook โ€“ log in or sign up" have-screenshot=true
2024/09/25 08:30:21 INFO result ๐Ÿค– target=http://fbcdn.net:443 status-code=200 title="Facebook โ€“ log in or sign up" have-screenshot=true
2024/09/25 08:30:27 INFO result ๐Ÿค– target=https://fbcdn.net:443 status-code=200 title="Facebook โ€“ log in or sign up" have-screenshot=true
2024/09/25 08:30:34 INFO result ๐Ÿค– target=http://amazon.com:80 status-code=200 title=Amazon.com have-screenshot=true
2024/09/25 08:30:40 INFO result ๐Ÿค– target=http://amazon.com:443 status-code=200 title=Amazon.com have-screenshot=true
2024/09/25 08:30:47 INFO result ๐Ÿค– target=https://amazon.com:443 status-code=200 title=Amazon.com have-screenshot=true
2024/09/25 08:31:02 INFO result ๐Ÿค– target=http://fastly.net:80 status-code=200 title="Powering the best of the internet | Fastly" have-screenshot=true
2024/09/25 08:31:04 INFO result ๐Ÿค– target=http://fastly.net:443 status-code=200 title="Powering the best of the internet | Fastly" have-screenshot=true
2024/09/25 08:31:09 INFO result ๐Ÿค– target=http://googleusercontent.com:80 status-code=404 title="Error 404 (Not Found)!!1" have-screenshot=true
2024/09/25 08:31:14 INFO result ๐Ÿค– target=http://googleusercontent.com:443 status-code=404 title="Error 404 (Not Found)!!1" have-screenshot=true
2024/09/25 08:31:15 INFO result ๐Ÿค– target=https://fastly.net:443 status-code=200 title="Powering the best of the internet | Fastly" have-screenshot=true
2024/09/25 08:31:19 INFO result ๐Ÿค– target=https://googleusercontent.com:443 status-code=404 title="Error 404 (Not Found)!!1" have-screenshot=true
2024/09/25 08:31:23 INFO result ๐Ÿค– target=http://googlesyndication.com:80 status-code=200 title=Google have-screenshot=true
2024/09/25 08:31:28 INFO result ๐Ÿค– target=http://googlesyndication.com:443 status-code=200 title=Google have-screenshot=true
2024/09/25 08:31:33 INFO result ๐Ÿค– target=https://googlesyndication.com:443 status-code=200 title=Google have-screenshot=true
2024/09/25 08:31:37 INFO result ๐Ÿค– target=http://wordpress.org:80 status-code=200 title="Blog Tool, Publishing Platform, and CMS โ€“ WordPress.org" have-screenshot=true
2024/09/25 08:31:42 INFO result ๐Ÿค– target=http://wordpress.org:443 status-code=200 title="Blog Tool, Publishing Platform, and CMS โ€“ WordPress.org" have-screenshot=true
2024/09/25 08:31:46 INFO result ๐Ÿค– target=https://wordpress.org:443 status-code=200 title="Blog Tool, Publishing Platform, and CMS โ€“ WordPress.org" have-screenshot=true
2024/09/25 08:31:53 INFO result ๐Ÿค– target=http://icloud.com:80 status-code=200 title=iCloud have-screenshot=true
2024/09/25 08:31:57 INFO result ๐Ÿค– target=http://sharepoint.com:80 status-code=200 title="Microsoft SharePoint Online - Collaboration Software | Microsoft 365" have-screenshot=true
2024/09/25 08:31:58 INFO result ๐Ÿค– target=http://icloud.com:443 status-code=200 title=iCloud have-screenshot=true
2024/09/25 08:32:02 INFO result ๐Ÿค– target=https://icloud.com:443 status-code=200 title=iCloud have-screenshot=true
2024/09/25 08:32:05 INFO result ๐Ÿค– target=http://pinterest.com:80 status-code=200 title=Pinterest have-screenshot=true
2024/09/25 08:32:10 INFO result ๐Ÿค– target=http://pinterest.com:443 status-code=200 title=Pinterest have-screenshot=true
2024/09/25 08:32:14 INFO result ๐Ÿค– target=https://pinterest.com:443 status-code=200 title=Pinterest have-screenshot=true
2024/09/25 08:32:20 INFO result ๐Ÿค– target=http://yahoo.com:80 status-code=200 title="Yahoo is part of the Yahoo family of brands" have-screenshot=true
2024/09/25 08:32:25 INFO result ๐Ÿค– target=http://yahoo.com:443 status-code=200 title="Yahoo is part of the Yahoo family of brands" have-screenshot=true
2024/09/25 08:32:31 INFO result ๐Ÿค– target=https://yahoo.com:443 status-code=200 title="Yahoo is part of the Yahoo family of brands" have-screenshot=true
2024/09/25 08:32:32 INFO result ๐Ÿค– target=http://whatsapp.net:80 status-code=200 title="WhatsApp | Secure and Reliable Free Private Messaging and Calling" have-screenshot=true
2024/09/25 08:32:37 INFO result ๐Ÿค– target=http://whatsapp.net:443 status-code=200 title="WhatsApp | Secure and Reliable Free Private Messaging and Calling" have-screenshot=true
2024/09/25 08:32:40 INFO result ๐Ÿค– target=https://whatsapp.net:443 status-code=200 title="WhatsApp | Secure and Reliable Free Private Messaging and Calling" have-screenshot=true
2024/09/25 08:33:04 INFO result ๐Ÿค– target=http://mail.ru:80 status-code=200 title="Mail: ะŸะพั‡ั‚ะฐ, ะžะฑะปะฐะบะพ, ะšะฐะปะตะฝะดะฐั€ัŒ, ะ—ะฐะผะตั‚ะบะธ, ะŸะพะบัƒะฟะบะธ โ€” ัะตั€ะฒะธัั‹ ะดะปั ั€ะฐะฑะพั‚ั‹ ะธ ะถะธะทะฝะธ" have-screenshot=true
2024/09/25 08:33:05 INFO result ๐Ÿค– target=http://mail.ru:443 status-code=200 title="Mail: ะŸะพั‡ั‚ะฐ, ะžะฑะปะฐะบะพ, ะšะฐะปะตะฝะดะฐั€ัŒ, ะ—ะฐะผะตั‚ะบะธ, ะŸะพะบัƒะฟะบะธ โ€” ัะตั€ะฒะธัั‹ ะดะปั ั€ะฐะฑะพั‚ั‹ ะธ ะถะธะทะฝะธ" have-screenshot=true
2024/09/25 08:33:26 INFO result ๐Ÿค– target=http://digicert.com:443 status-code=200 title="TLS/SSL Certificate Authority | Leader in Digital Trust | DigiCert" have-screenshot=true
2024/09/25 08:33:27 INFO result ๐Ÿค– target=http://digicert.com:80 status-code=200 title="TLS/SSL Certificate Authority | Leader in Digital Trust | DigiCert" have-screenshot=true
2024/09/25 08:33:46 INFO result ๐Ÿค– target=https://digicert.com:443 status-code=200 title="TLS/SSL Certificate Authority | Leader in Digital Trust | DigiCert" have-screenshot=true
2024/09/25 08:33:46 INFO result ๐Ÿค– target=https://digicert.com:80 status-code=200 title="TLS/SSL Certificate Authority | Leader in Digital Trust | DigiCert" have-screenshot=true
2024/09/25 08:33:51 INFO result ๐Ÿค– target=https://tiktokv.com:443 status-code=404 title="404 Not Found" have-screenshot=true
2024/09/25 08:34:04 INFO result ๐Ÿค– target=http://msn.com:80 status-code=200 title="MSN South Africa | Latest News, Results, Celebrity, Hotmail & Outlook" have-screenshot=true
2024/09/25 08:34:16 INFO result ๐Ÿค– target=http://msn.com:443 status-code=200 title="MSN South Africa | Latest News, Results, Celebrity, Hotmail & Outlook" have-screenshot=true
2024/09/25 08:34:39 INFO result ๐Ÿค– target=https://msn.com:443 status-code=200 title="MSN South Africa | Latest News, Results, Celebrity, Hotmail & Outlook" have-screenshot=true
2024/09/25 08:34:58 INFO result ๐Ÿค– target=http://office365.com:80 status-code=200 title="Microsoft 365 - Subscription for Productivity Apps | Microsoft 365" have-screenshot=true
2024/09/25 08:35:14 INFO result ๐Ÿค– target=http://yandex.net:80 status-code=200 title="Are you not a robot?" have-screenshot=true
2024/09/25 08:35:24 INFO result ๐Ÿค– target=http://yandex.net:443 status-code=400 title=400 have-screenshot=true
2024/09/25 08:36:31 INFO result ๐Ÿค– target=https://wordpress.com:443 status-code=200 title="WordPress.com: Build a Site, Sell Your Stuff, Start a Blog & More" have-screenshot=true
2024/09/25 08:36:42 INFO result ๐Ÿค– target=http://zoom.us:80 status-code=200 title="One platform to connect | Zoom" have-screenshot=true
2024/09/25 08:36:53 INFO result ๐Ÿค– target=http://zoom.us:443 status-code=200 title="One platform to connect | Zoom" have-screenshot=true
2024/09/25 08:36:54 INFO result ๐Ÿค– target=http://whatsapp.com:80 status-code=200 title="WhatsApp | Secure and Reliable Free Private Messaging and Calling" have-screenshot=true
2024/09/25 08:36:57 INFO result ๐Ÿค– target=https://zoom.us:443 status-code=200 title="One platform to connect | Zoom" have-screenshot=true
2024/09/25 08:36:58 INFO result ๐Ÿค– target=https://cloudflare.net:443 status-code=403 title="Just a moment..." have-screenshot=true
2024/09/25 08:37:19 INFO result ๐Ÿค– target=http://qq.com:80 status-code=200 title=่…พ่ฎฏ็ฝ‘ have-screenshot=true
2024/09/25 08:37:23 INFO result ๐Ÿค– target=http://qq.com:443 status-code=200 title=่…พ่ฎฏ็ฝ‘ have-screenshot=true
2024/09/25 08:37:24 INFO result ๐Ÿค– target=https://qq.com:443 status-code=200 title=่…พ่ฎฏ็ฝ‘ have-screenshot=true
2024/09/25 08:37:31 INFO result ๐Ÿค– target=http://google-analytics.com:80 status-code=200 title="Analytics Tools & Solutions for Your Business - Google Analytics" have-screenshot=true
2024/09/25 08:37:34 INFO result ๐Ÿค– target=http://google-analytics.com:443 status-code=200 title="Analytics Tools & Solutions for Your Business - Google Analytics" have-screenshot=true
2024/09/25 08:37:38 INFO result ๐Ÿค– target=https://google-analytics.com:443 status-code=200 title="Analytics Tools & Solutions for Your Business - Google Analytics" have-screenshot=true
2024/09/25 08:37:43 INFO result ๐Ÿค– target=http://tiktok.com:80 status-code=200 title="Explore - Find your favourite videos on TikTok" have-screenshot=true
2024/09/25 08:37:45 INFO result ๐Ÿค– target=http://tiktok.com:443 status-code=200 title="Explore - Find your favourite videos on TikTok" have-screenshot=true
2024/09/25 08:37:52 INFO result ๐Ÿค– target=https://tiktok.com:443 status-code=200 title="Explore - Find your favourite videos on TikTok" have-screenshot=true
2024/09/25 08:38:03 INFO result ๐Ÿค– target=http://blogspot.com:80 status-code=200 title="Blogger.com - Create a unique and beautiful blog easily." have-screenshot=true
2024/09/25 08:38:16 INFO result ๐Ÿค– target=https://blogspot.com:443 status-code=200 title="Blogger.com - Create a unique and beautiful blog easily." have-screenshot=true
2024/09/25 08:38:24 INFO result ๐Ÿค– target=http://reddit.com:80 status-code=200 title="Reddit - Dive into anything" have-screenshot=true
2024/09/25 08:38:27 INFO result ๐Ÿค– target=http://reddit.com:443 status-code=200 title="Reddit - Dive into anything" have-screenshot=true
2024/09/25 08:38:35 INFO result ๐Ÿค– target=https://reddit.com:443 status-code=200 title="Reddit - Dive into anything" have-screenshot=true
2024/09/25 08:38:36 INFO result ๐Ÿค– target=http://opera.com:80 status-code=200 title="Opera Web Browser | Faster, Safer, Smarter | Opera" have-screenshot=true
2024/09/25 08:38:44 INFO result ๐Ÿค– target=http://opera.com:443 status-code=200 title="Opera Web Browser | Faster, Safer, Smarter | Opera" have-screenshot=true
2024/09/25 08:38:47 INFO result ๐Ÿค– target=https://opera.com:443 status-code=200 title="Opera Web Browser | Faster, Safer, Smarter | Opera" have-screenshot=true
2024/09/25 08:38:52 INFO result ๐Ÿค– target=https://googleadservices.com:443 status-code=404 title="Error 404 (Not Found)!!1" have-screenshot=true
2024/09/25 08:38:52 INFO result ๐Ÿค– target=http://unity3d.com:80 status-code=200 title="Unity Real-Time Development Platform | 3D, 2D, VR & AR Engine" have-screenshot=true
2024/09/25 08:38:54 INFO result ๐Ÿค– target=http://snapchat.com:80 status-code=200 title="Less social media. More Snapchat." have-screenshot=true
2024/09/25 08:38:57 INFO result ๐Ÿค– target=http://snapchat.com:443 status-code=200 title="Less social media. More Snapchat." have-screenshot=true
2024/09/25 08:38:59 INFO result ๐Ÿค– target=https://snapchat.com:443 status-code=200 title="Less social media. More Snapchat." have-screenshot=true
2024/09/25 08:39:01 INFO result ๐Ÿค– target=http://trbcdn.net:80 status-code=403 title="403 Forbidden" have-screenshot=true
2024/09/25 08:39:02 INFO result ๐Ÿค– target=http://trbcdn.net:443 status-code=403 title="403 Forbidden" have-screenshot=true
2024/09/25 08:39:05 INFO result ๐Ÿค– target=https://trbcdn.net:443 status-code=403 title="403 Forbidden" have-screenshot=true

real    13m2.123s
user    178m59.817s
sys 2m13.824s

# ls
gowitness.jsonl  gowitness.sqlite3  screenshots

# ls -lah
total 92M
drwxr-xr-x 1 root root   86 Sep 25 08:39 .
drwx------ 1 root root   36 Sep 25 08:26 ..
-rw-r--r-- 1 root root  34M Sep 25 08:39 gowitness.jsonl
-rw-r--r-- 1 root root  58M Sep 25 08:39 gowitness.sqlite3
drwxr-xr-x 1 root root 4.2K Sep 25 08:39 screenshots

# sqlite3 gowitness.sqlite3 "select url,response_code from results"
http://twitter.com:80|200
http://twitter.com:443|200
https://twitter.com:443|200
http://cloudflare.com:80|200
http://cloudflare.com:443|200
https://cloudflare.com:443|200
http://instagram.com:80|200
http://instagram.com:443|200
http://linkedin.com:443|200
https://linkedin.com:443|200
http://live.com:80|200
http://live.com:443|200
https://live.com:443|200
https://googletagmanager.com:443|404
http://fbcdn.net:80|200
...

The next thing I'll test would be to wrap it all in a simple go program that shells out to invoke gowitness.

mr-pmillz commented 3 weeks ago

Following up.

I got it working in GitLab CI/CD using 3 threads and the --write-stdout option.

However, it seems that the gowitness report generate command fails to generate the static report in CI/CD.

[INFO] /root/go/bin/gowitness -D scan file -f /output/httpx/urls.txt --screenshot-path /output/gowitness/screenshots --write-db --write-db-uri sqlite:///output/gowitness/gowitness.sqlite3 --write-jsonl --write-jsonl-file /output/gowitness/results.jsonl --write-stdout -t 3 --log-scan-errors --chrome-user-agent 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:126.0) Gecko/20100101 Firefox/126.0' 
<snippet/>
took: 19m16.840064119s
[INFO]  Command=/root/go/bin/gowitness report generate --db-uri sqlite:///output/gowitness/gowitness.sqlite3 --zip-name /output/gowitness/report.zip --screenshot-path /output/gowitness/screenshots
2024/09/26 02:44:07 INFO generating HTML report for results count=0
2024/09/26 02:44:08 INFO report zip file generated successfully path=/output/gowitness/report.zip
took: 359.786308ms

Not sure why the results count=0 ๐Ÿค” I haven't tested the report generate out in docker yet.

boozezela commented 2 weeks ago

Not sure why the results count=0 ๐Ÿค”

Hi, I was experimenting with gowitness just today and it looks like the --db-uri and --json-file are ignored altogether when generating reports.

For instance, this is what happens when I save data in a jsonl file and then I try to generate a report. Similarly, saving a sqlite3 db with a regular or custom name would work, but loading said DB from a directory other than the current one would fail when generating a report.

$ gowitness scan single -u https://google.com --driver gorod --write-jsonl
2024/09/30 23:39:35 INFO result ๐Ÿค– target=https://google.com status-code=200 title=Google have-screenshot=true

$ ls -l
total 680
-rw-r--r--  1 user  staff  345752 30 Sep 23:39 gowitness.jsonl
drwxr-xr-x  3 user  staff      96 30 Sep 23:39 screenshots

$ gowitness report list --json-file ./gowitness.jsonl
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ When            โ”‚ Failed โ”‚ Code โ”‚ Input URL          โ”‚ Title  โ”‚ ~Size โ”‚ Net โ”‚ Con โ”‚ Header โ”‚ Cookie โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Sep 30 23:39:27 โ”‚ false  โ”‚ 200  โ”‚ https://google.com โ”‚ Google โ”‚ 1kb   โ”‚ 38  โ”‚ 0   โ”‚ 17     โ”‚ 2      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

$ gowitness report generate --json-file ./gowitness.jsonl
2024/09/30 23:40:05 INFO generating HTML report for results count=0
2024/09/30 23:40:05 INFO report zip file generated successfully path=gowitness-report.zip

$ ls -l
total 1304
-rw-r--r--  1 user  staff  234722 30 Sep 23:45 gowitness-report.zip
-rw-r--r--  1 user  staff  356597 30 Sep 23:45 gowitness.jsonl
-rw-r--r--  1 user  staff   69632 30 Sep 23:45 gowitness.sqlite3
drwxr-xr-x  3 user  staff      96 30 Sep 23:45 screenshots

Note the empty gowitness.sqlite3 that gets created after gowitness report generate.

As a workaround, I let gowitness write its default db file within the current directory, then generate a report based on that.

$ gowitness scan single -u https://google.com --driver gorod --write-db
2024/09/30 23:47:55 INFO result ๐Ÿค– target=https://google.com status-code=200 title=Google have-screenshot=true
$ gowitness report list
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ When            โ”‚ Failed โ”‚ Code โ”‚ Input URL          โ”‚ Title  โ”‚ ~Size โ”‚ Net โ”‚ Con โ”‚ Header โ”‚ Cookie โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Sep 30 23:47:50 โ”‚ false  โ”‚ 200  โ”‚ https://google.com โ”‚ Google โ”‚ 1kb   โ”‚ 38  โ”‚ 0   โ”‚ 17     โ”‚ 2      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
$ gowitness report generate
2024/09/30 23:48:08 INFO generating HTML report for results count=1
2024/09/30 23:48:08 INFO report zip file generated successfully path=gowitness-report.zip
leonjza commented 2 weeks ago

Created a new issue to track the report generation issues. Let's use this one for the concurrency related issue.

leonjza commented 2 weeks ago

Made small bits of progress in testing this. Using hosted gitlab, I wrote a pipeline that downloads a domains list, and then installs and runs gowitness both directly and using a another golang program to wrap gowitness. In both cases it seems to work fine.

https://gitlab.com/leonjza/gowitness-cicd/-/pipelines/1477509973

Are there any obvious differences here to what you are doing @mr-pmillz ?

mr-pmillz commented 2 weeks ago

Made small bits of progress in testing this. Using hosted gitlab, I wrote a pipeline that downloads a domains list, and then installs and runs gowitness both directly and using a another golang program to wrap gowitness. In both cases it seems to work fine.

https://gitlab.com/leonjza/gowitness-cicd/-/pipelines/1477509973

Are there any obvious differences here to what you are doing @mr-pmillz ?

Very nice!

Looks good. In my case am using a self-hosted runner on self-hosted gitlab but in either case I think this is quite sufficient to mark this issue as resolved. The threads constraint for my CI/CD is likely a self-hosted runner resources issue.

I believe the main issue now is the report generation using a custom sqlite db or jsonl file path. I see you've opened up a separate issue https://github.com/sensepost/gowitness/issues/240 for that, which should probably be a quick and easy fix, I haven't had time to dig into the code yet but am sure you know exactly where to look.

Thanks again! ๐Ÿป