Searchable pdf - Githubissues

Dzarda7 commented 1 year ago

Hello,

thank you for this project. I would like to ask you if it is possible to make pdf generated from gerber searchable. It would be great, but I do not even know if it is possible. I tried some online pdf ocr webpages, but it works just partially and pdf is not vector after that.

Thanks.

jpt13653903 commented 1 year ago

Although that would be really awesome, the Gerber file format does not do text. It does the glyphs as raw vector images. I don't even have them as letter names, in which case I could have done text extraction.

This said - many CAD tools (like KiCAD and Altium) can export to PDF directly, in which case it is searchable.

Dzarda7 commented 1 year ago

But it does not support export into one pdf. Would it be at least possible to take pdf as input? Thanks.

jpt13653903 commented 1 year ago

You'll probably be better off using a PDF editor in that case. Something along the lines of Foxit. I have not used it myself, but you might even be able to script it as part of the "build" workflow.

This said, I don't think it can combine layers like Gerber2PDF can. I don't believe I'll be able to take a PDF input without significant development effort -- the PDF standard is too generic, with just too many edge-cases. And the transparency model is truly terrible to work with -- you can't simply combine pages onto one page with a given transparency per page: you have to encompass them into transparency groups, etc.

I might be able to write something that combines PDF pages, without changing colour or transparency, by simply placing the PDF page contents into x-objects and then drawing those onto the same (new) page, but I don't think I'll get the time to do something like that in at least the next year or 2 -- my day-job is keeping me really busy.

But -- Come to think of it -- I'll probably be able to whip up a LaTeX document for you that does this -- can you send me a set of example PDFs with a brief description of the output you expect so I can give it a shot? There might even be a Python library that can be used for this -- I'll have a look.

Best of luck. I believe your use-case is that you'd like to search for something like "R123" in the PDF output that includes the silk layer? And your CAD tool cannot combine layers for you?

Dzarda7 commented 1 year ago

Thank you for everything. I thought that you might have done something like that, so I tried to ask. I really do not want to bother you with it. I can work with python myself, so I will probably try to do it sometime.

You are right, that is exactly what I want to do with my pdfs. KiCAD has awesome plugin called board2pdf, but it will be quite hard to port it to Mentor CAD software I think.

I will try to look at LaTeX, that is something I want to get more familiar with. I did not know that something like that is possible, so thank you for that.

jpt13653903 commented 1 year ago

In LaTeX, you essentially need to set the paper size to the output you want, with all margins set to zero and "text" width and height equal to the page size.

You then include the PDFs as images -- something along the lines of taking the following 3x PDFs:

Which looks like this:

3xPDFs

Writing some LaTeX:

\documentclass{article}

\usepackage{graphicx}

\setlength{\paperwidth} {100mm}
\setlength{\paperheight}{100mm}

\pagestyle{empty}
\setlength{\hoffset}       {-1in}
\setlength{\voffset}       {-1in}
\setlength{\footskip}      {0mm}
\setlength{\headheight}    {0mm}
\setlength{\headsep}       {0mm}
\setlength{\marginparsep}  {0mm}
\setlength{\marginparwidth}{0mm}
\setlength{\textheight}    {\paperheight}
\setlength{\textwidth}     {\paperwidth}
\setlength{\oddsidemargin} {0mm}
\setlength{\topmargin}     {0mm}

\setlength{\parindent}{0mm}
\setlength{\parskip}  {0mm}
\setlength{\lineskip} {0mm}

\begin{document}
  \includegraphics[height=\textheight]{Red}\par%
  \vspace{-\textheight}%
  \includegraphics[height=\textheight]{Green}\par%
  \vspace{-\textheight}%
  \includegraphics[height=\textheight]{Blue}\par%
\end{document}

And then compiling using pdfLaTeX (vanilla LaTeX compiles to DVI, and cannot import PDFs, so nobody uses that one anymore).

There are, of course, various packages that can help you with this, such as geometry for setting up the paper size and layout, but I like doing things vanilla wherever possible and or practical.

The result is the PDF below:

Example.pdf

Which looks like:

jpt13653903 commented 1 year ago

Another option you can try is running the PDFs through InkScape. The newer versions have really powerful command-line scripting, but, depending on what you want to achieve, I'd stick with LaTeX or Python.

jpt13653903 commented 1 year ago

I just checked... LaTeX preserves transparency correctly (as long as the original PDFs have transparency):

If you'd like to play with the three circles, the original SVG is below. I used InkScape to create the file and exported the 3 layers to 3 individual PDFs.

RGB

Dzarda7 commented 1 year ago

You are awesome, thank you so much. That is it, we were doing it in inkscape, but it took like a day to do everything. I will try to automate it with tools you proposed. Once again thank you so much, you are awesome.

jpt13653903 / Gerber2PDF

Searchable pdf #25