modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
1.98k stars 379 forks source link

Characters in field name replaced with underscore #88

Open schang933 opened 7 years ago

schang933 commented 7 years ago

It seems the closest correlation to a field name in the parser output is the "id.Id" attribute. However, I'm seeing some characters (my guess is non-alphanumeric) being replaced with underscores.

Is it possible to provide a field that is the pure field name? I think it would be really useful as tools like PDFFiller and PDFTK don't give nearly as much detail about form fields as pdf2json.

Note: moved details to follow up comment.

schang933 commented 7 years ago

(second comment with details)

I attached a file "example.pdf". For this issue, the field named "Hello World" becomes "Hello_World". Would be great if there was a version with an untampered field name.

      "Fields": [{
        "style": 48,
        "T": {
          "Name": "alpha",
          "TypeInfo": {}
        },
        "id": {
          "Id": "Hello_World",
          "EN": 0
        },
        "TI": 0,
        "AM": 0,
        "x": 5.291,
        "y": 6.792,
        "w": 26.066,
        "h": 10.128
      }],

example.pdf

jelizarovas commented 2 years ago

did this ever get resolved?