@pdf/pdftext is a lightweight module for extracting text from PDFs, built on PDF.js. It can be used in various environments, including:
import { pdfText } from 'jsr:@pdf/pdftext';
const pdfBuffer: ArrayBuffer = Deno.readFileSync('./path/to/pdf');
const page: { [pageno: number]: string } = await pdfText(pdfBuffer);
// To get page 1
console.log(\`Page 1 text: ${page[1]}\`);
// To get all pages
// console.log(\`All page text: \n${page[0]}\`);
pdftext.js
module using
curl
or a similar utility:
curl -L -O -C- https://jsr.io/@pdf/pdftext/1.2.8/src/pdftext.js
<script type="module">
import { pdfText } from './pdftext.js';
document.getElementById('file-input').addEventListener('change', async (event) => {
const file = event.target.files[0];
const pdfBuffer = await file.arrayBuffer();
const page = await pdfText(pdfBuffer);
// To get page 1
console.log(\`Page 1: ${page[1]}\`);
// To get all pages
// console.log(\`All page: \n${page[0]}\`);
});
</script>
<input type="file" id="file-input" />
deno install -frgA jsr:@pdf/pdftext/pdftxt
pdftxt sample.pdf
deno uninstall pdftxt
.\pdftxt.exe sample.pdf
./pdftxt sample.pdf
Typescript code for testing purposes :
import { test } from "jsr:@pdf/pdftext";
console.log(await test());
This project is licensed under the MIT License.
Some dependencies of this project may be licensed under different terms. In particular, PDF.js is licensed under the Apache License 2.0.