revolunet / pypdftk

Python module to drive the awesome pdftk binary.
Other
145 stars 61 forks source link

dump_data_fields does not support multi-line FieldValues #43

Open mogli91 opened 3 years ago

mogli91 commented 3 years ago

While debugging issue #37 , I noticed that dump_data_fields does not support multi-line FieldValues atm. This is due to the logic in run_command where the result of check_output is split into individual lines. Given that text fields can contain newline/carriage return characters, this logic should be adapted.

dtorres-sf commented 3 years ago

I had a similar need to this and have a function on my fork called get_fields. I created this some time back and now it looks very similar to what dump_data_fields has evolved to. However, it does handle the multi-line fields correctly. Here is how I solve that:

https://github.com/dustingtorres/pypdftk/blob/0c3b58ec088ba1d6d795df7615c462e130e4344f/pypdftk.py#L284

It basically continues parsing until it finds a line that starts with "Field" or "---" and then will continue on. It needs to use the _utf8 version of dump_data_fields so this also relates to #37