Open quintesse opened 1 year ago
If you are using Next.js, another workaround suggested in https://github.com/opengovsg/pdf2md/issues/69#issuecomment-1884786365 also works: in brief, specify @opendocsg/pdf2md
and pdfjs-dist
under experimental.serverComponentsExternalPackages
in your Next config.
is providing font directories even useful? Would it perhaps be possible to do without?
I might be wrong, but pdfjs may be using the fonts to better recognise text within the PDF documents. I would be happy for you (or someone else) to see if removing the font directory would have any impact on the conversion process! If it doesn't, I'll gladly accept a PR.
If you are using Next.js, another workaround suggested in #69 (comment) also works: in brief, specify
@opendocsg/pdf2md
andpdfjs-dist
underexperimental.serverComponentsExternalPackages
in your Next config.
Can confirm that this solution seems to work as well.
Please try this with v0.2.0 of this package, which removes the dependency on node-specific libs, including path
.
@LoneRifle I haven't been able to take a good look at it, but this is the error I get by simply updating to the latest version and removing the work-around:
Failed to compile.
./node_modules/unpdf/dist/index.cjs
Module not found: Package path ./pdfjs is not exported from package C:\Projects\test\node_modules\unpdf (see exports field in C:\Projects\test\node_modules\unpdf\package.json)
https://nextjs.org/docs/messages/module-not-found
Import trace for requested module:
./node_modules/@opendocsg/pdf2md/lib/util/pdf.js
./node_modules/@opendocsg/pdf2md/lib/pdf2md.js
...
> Build failed because of webpack errors
If I re-add the work-around (the one where you add serverComponentsExternalPackages
to the next.config.js file) it at least compiles. I haven't had time yet to test if it still works.
Is your feature request related to a problem? Please describe.
Because of the use of the function
require.resolve()
in this code: https://github.com/opengovsg/pdf2md/blob/master/lib/util/pdf.js#L19 the use of pdf2md in my project results in errors in production builds at runtime. The project uses WebPack to bundle the code and that particular function cannot be used.An explanation of the problem can be found in this Webpack issue: https://github.com/webpack/webpack/issues/13931
(It's basically because once a project has been bundled, asking a module for it's filename just doesn't make sense anymore)
Describe the solution you'd like
I'd like the code to either not use that method by using an alternative or provide some kind of option/flag that avoids its use. Also, given the fact that this project is about converting a PDF file to Markdown, which is text-only, is providing font directories even useful? Would it perhaps be possible to do without?
Describe alternatives you've considered
The work-around mentioned in the Webpack issue fortunately works. It's a completely incidental fix that reduces runtime code efficiency, so I'd prefer not to use it.