josephlei / ca-jobs-schema

Exploring ALL California Job Classification Schemas
0 stars 6 forks source link

#need to figure out a way to convert pdf to txt before grep #3

Open vr00n opened 7 years ago

vr00n commented 7 years ago

Try using PDFGREP - I was able to convert the schema PDF to a fairly structured format.

From there you can potentially use grep's contextual operators "-A, -B" to include n lines before or after a pattern match.

Here are my results on a simple pdfgrep command

pdfgrep " " schema_alphabetic.pdf | uniq | more
State of California
Civil Service Pay Scale - Alpha by Class Title
  Schem Class
          Code   Full Class Title
                           Compensation              SISA Footnotes         AR Crit  MCR Prob. Mo. WWG NT   CBID
  CU70     1733  ACCOUNT CLERK II
                      $2,471.00 - $3,097.00           SISA                             1        6   2       R 04
  ME10     4915  ACCOUNT MANAGER, CALIFORNIA EXPOSITION AND STATE FAIR
                      $5,553.00 - $6,901.00                01 43                       1       12   E       S 01
  JL32     4177  ACCOUNTANT I (SPECIALIST)
                 A    $3,000.00 - $3,757.00                                285         1        6   2       R 01
                 L    $3,000.00 - $3,757.00                                285         1        6   2       R 01
josephlei commented 7 years ago

Thank you for the recommendation, I will definitely check this out!

In our discussions, this will soon be provided from source system, openly, in machine readable format and possibly an API. I wasn't aware of this package and think it will be useful in other applications in the future as well, thanks again.