highcharts / node-export-server

Highcharts Node.js export server
Other
356 stars 260 forks source link

Fetch Highcharts from CDN during docker build #415

Open danielbecroft opened 1 year ago

danielbecroft commented 1 year ago

I'm currently experimenting with the enhancement/puppeteer branch, trying to move from a Windows installation to running the export server inside a container.

I've started with something basic, based on other issues that have been reported

FROM node:alpine

WORKDIR /home/highchart-export-server 

ENV ACCEPT_HIGHCHARTS_LICENSE=1
ENV HIGHCHARTS_VERSION=11.1.0
ENV HIGHCHARTS_USE_MAPS=0
ENV HIGHCHARTS_USE_GANTT=0
ENV HIGHCHARTS_CDN=npm

RUN apk update \
    && apk upgrade \
    && apk add --no-cache git patch \
    && rm -rf /var/lib/apt/lists/*

RUN npm install highcharts-export-server@3.0.0-beta.1

EXPOSE 7801

ENTRYPOINT ["node", "node_modules/highcharts-export-server/bin/cli.js", "--enableServer", "1", "--port", "7801", "--logLevel", "3"]

The issue I have is that running the export server on container startup initiates the script download from the CDN. To avoid this in production, and ensure we have exactly the same container running in each instance, what's the best approach to get this information downloaed and baked into the container image?

I've tried runing the cli.js directly in the docker build command to no avail..

Is it possible to pre-fetch the scripts etc during the build phase, or is it limited to startup of the server itself?

jszuminski commented 1 year ago

Thanks for reporting!

If I understand you correctly, you would like there to be an equivalent to the node build.js which was available in the Phantom-based version of the Export Server.

We got rid of the build.js file and the separate building process, but after all the prioritized issues related to server health are taken care of, we'll consider adding back this option.

One last thing: could you please explain why do you need to fetch all the scripts in a different process? Do you plan to run this node ./bin/cli.js --enableServer 1 ... often? If you plan to run it only once, isn't the node build.js unnecessary?

davidseverwright commented 1 year ago

I currently start the export server every hour to send some reports, and then stop it again. The files are re-downloaded every time. The frequency of the container starting is irrelevant though, it's not about the bandwidth.

jszuminski commented 1 year ago

Agree absolutely, thanks for your thorough explanation!

We'll definitely add it to our backlog. I'll keep you posted here.

danielbecroft commented 1 year ago

Thanks @jakubSzuminski , my thoughts are the same as @davidseverwright: undesirable firewall changes in a production environment, reproducability of builds and deployments, and dependency on an external resource for container startup.

cvasseng commented 9 months ago

Revisiting this: We can't bundle the actual library due to licensing, and because different users need different versions of them (some may only have a license for V9 for instance, or some may want to lock it to a particular version for one reason or another).

That said, I understand your use case specifically, and why this poses a challenge there.

There are a couple of potential solutions we could implement fairly quickly:

1) We could allow for overriding the CDN URL, so that the files could be hosted somewhere else (e.g. on an internal CDN)

2) We could add config that allows for loading the library cache from the filesystem instead of through a CDN, along with a simple bake tool (as part of the CLI for instance) that does the current CDN fetch to an arbitrary file system location specified as input. You could then have a pre-fetched cache stored alongside your dockerfile, extract it into the container in the docker file, and add a configuration flag to the export startup flag to point to the extracted cache.

Would either approach be suitable for you?

It's also possible that we could add Highcharts as an optional peer dependency, so that you could install Highcharts itself through NPM in your dockerfile and lock that to the version you require. However, we'd need to do some testing to confirm that approach (which IMO is arguably the better of the three - though it would take some more time to get it up and running provided it's a feasible approach in general).

danielbecroft commented 9 months ago

Hi @cvasseng , I think either approach would work fine for our scenario (option 2, or the npm peer dependency would be the preferred). Our approach for option 2 would be using a multi-stage build. Something like (untested):

FROM node AS base
ENV ACCEPT_HIGHCHARTS_LICENSE=1
ENV HIGHCHARTS_VERSION=11.1.0
ENV HIGHCHARTS_USE_MAPS=0
ENV HIGHCHARTS_USE_GANTT=0

RUN npm install highcharts-export-server@3.1.1

FROM base AS installer
RUN node node_modules/highcharts-export-server/bin/cli.js --download --cache-dir ./cache

FROM base
COPY --from=installer ./cache ./cache
ENTRYPOINT .....
noxify commented 8 months ago

It's also possible that we could add Highcharts as an optional peer dependency, so that you could install Highcharts itself through NPM in your dockerfile and lock that to the version you require.

I have used this approach together with a custom CDN.

In our case, we use the built-in express instance to create a new route to simulate the "CDN". The endpoint reads the files from the node_modules/highcharts.

// package.json
{
  "name": "chart-exporter",
  "type": "module",
  "scripts": {
    "dev": "node --watch src/server.js",
    "format": "prettier --check .",
    "format:fix": "npm run format -- --write",
    "lint": "eslint .",
    "lint:fix": "npm run lint -- --fix"
  },
  "dependencies": {
    "highcharts": "11.4.0",
    "highcharts-export-server": "3.1.1"
  },
  "devDependencies": {
    "@ianvs/prettier-plugin-sort-imports": "4.1.1",
    "@types/node": "^20.11.25",
    "eslint": "8.57.0",
    "eslint-plugin-import": "2.29.1",
    "prettier": "3.2.5"
  }
}
// src/server.js

import { readFileSync } from "fs"
import path from "path"
import exporter from "highcharts-export-server"

// https://github.com/highcharts/node-export-server?tab=readme-ov-file#default-json-config
const config = {
  puppeteer: {
    args: [],
  },
  highcharts: {
    version: "11.3.0",
    cdnURL: "http://localhost:8080/cdn/",
    forceFetch: false,
    coreScripts: ["highcharts"],
    modules: [
      "parallel-coordinates",
      "data",
      "static-scale",
      "broken-axis",
      "item-series",
      "pattern-fill",
      "series-label",
      "no-data-to-display",
    ],
    indicators: [],
    scripts: [],
  },
  export: {
    // your export options
  },
  customCode: {
    allowCodeExecution: false,
    allowFileResources: true,
    customCode: false,
    callback: false,
    resources: false,
    loadConfig: false,
    createConfig: false,
  },
  server: {
    // ... server config
  },
  pool: {
   // ... pool config
  },
  logging: {
    level: 2,
    file: "highcharts-export-server.log",
    dest: "log/",
  },
  ui: {
    enable: true,
    route: "/",
  },
  other: {
    noLogo: true,
  },
}

const main = async () => {
  exporter.server.get("/cdn/:version/:filename", (req, res) => {
    const filePath = path.join(
      path.resolve(),
      "node_modules/highcharts/",
      req.params.filename,
    )

    res.status(200).send(readFileSync(filePath))
  })

  // some modules are inside the `modules` directory
  // haven't found a way to solve this in one route
  exporter.server.get("/cdn/:version/modules/:filename", (req, res) => {
    const filePath = path.join(
      path.resolve(),
      "node_modules/highcharts/modules/",
      req.params.filename,
    )

    res.status(200).send(readFileSync(filePath))
  })

  exporter.setOptions(config, [])

  // we have to start the server before we initialize the pool
  // otherwise the local CDN endpoint isn't available 
  await exporter.startServer(config.server)

  await exporter.initPool(config)
}

void main()

We use this Dockerfile:


FROM node:20-alpine
ENV ACCEPT_HIGHCHARTS_LICENSE="YES"
ENV HIGHCHARTS_VERSION="11.3.0"
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

USER root
WORKDIR /app
COPY . .
RUN rm -rf node_modules/ \
    && rm -rf log/ \
    && rm -rf tmp/
RUN apk add --no-cache chromium nss freetype harfbuzz ca-certificates ttf-freefont dbus
RUN npm ci
RUN mkdir /var/run/dbus/ \
    && chmod -R 777 /var/run/dbus/ 

RUN chgrp -R 0 /app && \
    chmod -R g=u /app

EXPOSE 8080

USER 1000
HEALTHCHECK CMD /bin/true

CMD ["node", "src/server.js"]
erlichmen commented 7 months ago

I wrote a small hack to handle this for now, a simple script that I run during docker build. The script is under ./scripts/preapreCache.js and in the docker file I run:

RUN node ./scripts/preapreCache.js

const fs = require("fs");
import("../node_modules/highcharts-export-server/lib/cache.js").then(({ checkCache }) => {
  const config = JSON.parse(fs.readFileSync("config.json").toString());
  checkCache(config.highcharts).catch((err) => {
    console.error(err);
  });
});
bamarch commented 5 months ago

Another workaround is to start the server as part of the Docker build process and wait for the files to be downloaded into the cache

# build / install / configire the server here

RUN ./scripts/runAndStopServerToPopulateCache.sh

# define ENTRYPOINT or CMD here
#!/bin/sh
highcharts-export-server --enableServer "1" &
pid=$!
until test -f /path/to/manifest.json && test -f /path/to/sources.js; do
  sleep 1
done
kill $pid

Has a slight advantage since the CLI interface is more stable than the JS internals

level420 commented 3 months ago

@bamarch I've tried your workaround, but the previous puppeter run during image build leaves the chromium profile locked. See:

Wed Jul 31 2024 16:24:45 GMT+0000 [error] - [browser] Failed to launch a browser instance. 
 Error: Failed to launch the browser process! undefined
[16:16:0731/162445.179268:ERROR:process_singleton_posix.cc(353)] The profile appears 
to be in use by another Chromium process (20) on another computer (buildkitsandbox). 
Chromium has locked the profile so that it doesn't get corrupted. If you are sure no other 
processes are using this profile, you can unlock the profile and relaunch Chromium.

Die you find a solution to this?

level420 commented 3 months ago

@bamarch I've tried your workaround, but the previous puppeter run during image build leaves the chromium profile locked. See:

Wed Jul 31 2024 16:24:45 GMT+0000 [error] - [browser] Failed to launch a browser instance. 
 Error: Failed to launch the browser process! undefined
[16:16:0731/162445.179268:ERROR:process_singleton_posix.cc(353)] The profile appears 
to be in use by another Chromium process (20) on another computer (buildkitsandbox). 
Chromium has locked the profile so that it doesn't get corrupted. If you are sure no other 
processes are using this profile, you can unlock the profile and relaunch Chromium.

Die you find a solution to this?

The solution is to run @bamarch 's script as another user, not the user which runs the node-export-server. Then the locked profile is created for that user, being no problem anymore for the user being active when the container is running. In my Dockerfile I'm running the script as root, chown-ing the complete node-export-server directory to user node, and afterwards setting the user via USER node.

level420 commented 3 months ago

@cvasseng one more obstacle for creating a completely self containing docker image is when using older Highcharts versions within node-export-server, because older versions do not offer all the modules expected to be available as documented in manifest.json. See https://github.com/highcharts/node-export-server/blob/c671403f3c6d4dd2e914dd7dba5632323a845310/lib/cache.js#L351

I've successfully integrated @bamarch 's script in my Dockerfile, all needed sources for that specific old Highcharts version are downloaded during image creation, but when starting the container, the export server refetches all sources again, because of the mismatch of the modules available in the cache.

In this situation we'd need a command line switch which is the opposite of HIGHCHARTS_FORCE_FETCH e.g. HIGHCHARTS_PREVENT_FETCH or similar, which completely disables the fetching or re-fetching.

level420 commented 3 months ago

ATM I'm overriding lib/cache.js with my own modified version, where I brute force stop the cache update by setting

requestUpdate = false;

before https://github.com/highcharts/node-export-server/blob/c671403f3c6d4dd2e914dd7dba5632323a845310/lib/cache.js#L370