openzim / node-libzim

Libzim binding for Node.js: read/write ZIM files in Javascript
https://www.npmjs.com/package/@openzim/libzim
GNU General Public License v3.0
27 stars 11 forks source link

Usage question #110

Closed j0hnm4r5 closed 1 year ago

j0hnm4r5 commented 1 year ago

This isn't an issue, but a question on usage of this tool and ZIM files in general.

While using a Wikipedia ZIM archive and this library, if I try to serve the HTML of an article with something like this (via Fastify), I don't get any of the JS scripts, CSS, or images included:

server.get("/w/:path", async (request, reply) => {
  const { path } = request.params as { path: string };
  const outFile = "./example.zim";
  const archive = new Archive(outFile);

  const entry = archive.getEntryByPath(`A/${path}`);

  reply.header("Content-Type", "text/html");

  return entry.item.data.data.toString("utf-8");
});

How do I serve all of those extras? Are they part of the ZIM archive? I'm not seeing scripts or CSS when I list all of the entries in the archive, just articles and images.

kelvinhammond commented 1 year ago

Hello, I managed to hack this together as an example but I'm not quite sure the formats of the zim file urls.

import { Archive } from 'node-libzim';

const zimFile = 'wikibooks_en_all_maxi_2021-03.zim';
const archive = new Archive(zimFile);
const paths = [ 
  '../-/style.css',
  '../I/C_sharp.svg.png.webp',
  'Wikibooks',
  'Using_Wikibooks',
  'Wikibooks%3AWelcome',
  'Wikibooks_Stacks/Departments',
];
//console.log(archive.mainEntry.item.data.data.toString());
for(const path of paths) {
  const url = new URL(path, 'http://localhost');
  console.log("pathname:", url.pathname);
  const entry = archive.hasEntryByPath(url.pathname) ?
    archive.getEntryByPath(decodeURIComponent(url.pathname))
    : archive.getEntryByPath(decodeURIComponent(url.pathname.slice(1)));
  console.log({
    isRedirect: entry?.isRedirect,
    title: entry?.isRedirect ? entry?.redirect?.title : entry?.item?.title,
    path: entry?.isRedirect ? entry?.redirect?.path : entry?.item?.path,
  }); 
  //const entry = archive.getEntryByPath(path);
  //const entry = archive.getEntryByPath('/-/style.css');
  //const entry = archive.getEntryByPath('I/C_sharp.svg.png.webp');
  //const entry = archive.getEntryByPath('Using_Wikibooks');
  //console.log(entry?.item?.data?.data?.toString());
  console.log();
}

Also opening the zim file can be a bit slow at times. You may want to store a global variable for that instead of opening it per request.

j0hnm4r5 commented 1 year ago

Thank you! That was a big help in figuring out what's going on.

tldr; my terminal cut off the top of the list of console logs for all of the entries in the archive, so I didn't realize the styles and scripts were included. Once I realized that, I just made a wildcard route that serves those files:

server.get("/-/*", async (request, reply) => {
  const params = request.params as { path: string };
  const path = params["*"];

  const entry = archive.getEntryByPath(`-/${path}`);

  reply.header("Content-Type", "text/html");

  return entry.item.data.data.toString("utf-8");
});