iipc / jwarc

Java library for reading and writing WARC files with a typed API
Apache License 2.0
47 stars 8 forks source link

Should we include a Dockerfile? #10

Open ibnesayeed opened 5 years ago

ibnesayeed commented 5 years ago

I am wondering, will it be helpful to add a Dockerfile in the repo that includes Chromium/Google Chrome and other run-time requirements to make all the tools function as expected?

machawk1 commented 5 years ago

Squidwarc's Dockerfile might be useful here as a loose basis for including Chromium.

ato commented 5 years ago

Sure. I don't use docker much myself so happy to take your advice on that.

ibnesayeed commented 5 years ago

@ato can you list/describe all the stuff needed (i.e., dependencies and requirements) to put it together? If you were to set it up on a bare bone Ubuntu machine (for example), what would be the steps needed to put everything together? Roughly speaking, we will need some version of JRE, Chromium/Google Chrome, some specific environments variables and any other considerations?

ato commented 5 years ago

At compile time: openjdk-8-jdk, maven. At runtime: openjdk-8-jre, chromium/chrome.

Java 11 should also be fine, whichever is available.

Set env var BROWSER to the chromium/chrome executable. It may also be good to cap the java heap size by running it like like this:

java -Xmx128m -jar jwarc.jar <args>

Otherwise Java defaults to a percentage of the system's total RAM which can be unreasonably high on a large memory system.