jgillula / paperless-ngx-postprocessor

A powerful and customizable postprocessing script for paperless-ngx
GNU Affero General Public License v3.0
73 stars 8 forks source link

[Discussion] Custom Fields #19

Open hasechris opened 5 months ago

hasechris commented 5 months ago

Hi,

i wanted to ask in regards to custom fields because I could not find any mention of custom fields in the postprocessor (wether or not supported in any way).

My vision: I'm writing two dates on documents regarding when they got to me and if the document is an invoice when i payed it (if i have to pay manually).

The OCR can already recognize my handwriting today. In my mind i would have written a postprocessor rule which detects this text and copies it into custom fields. Sadly the last part is just not documented if supported and how to do it.

jgillula commented 5 months ago

Hi there! Custom fields aren't currently supported, although I think it's a brilliant idea. (This project was started long before Paperless-ngx had support for custom fields, which is why they're not supported.)

I'm a little too busy to code this up myself right now, but I'd be happy to accept pull requests if anyone wanted to try for themselves.

Tomb01 commented 3 months ago

I found this new issue on paperless repository: https://github.com/paperless-ngx/paperless-ngx/discussions/5482

hasechris commented 3 months ago

Hi @Tomb01,

yeah, thats not what I'm searching for, sorry :laughing: The PR just brings in the possibility to define the custom fields while uploading a new document. I want to set custom fields matching content in the document in this postprocessor. Also i'm already working on it, sadly i had to pause the work regarding this FR because of life and other things. But now im back on track and will upload a PR in the next 1-2 weeks.

hasechris commented 2 months ago

Hey @jgillula (or anyone who has the time to help me),

i now have a kinda working version - see my repo https://github.com/hasechris/paperless-ngx-postprocessor on the main branch.

Sadly i cant continue because i have a problem which i just cant find my mistake - I'm still making babysteps in python. When i debug my code everything works for custom fields until the value change in the file https://github.com/hasechris/paperless-ngx-postprocessor/blob/main/paperlessngx_postprocessor/postprocessor.py#L313.

The old metadata is in the variable metadata_in_filename_format and in line 313 the new variable new_metadata_in_filename_format should get filled with the changed metadata. Sadly in my branch also the old metadata variable is changed and I cant find out why.

State after Line 309 was executed: image

State after line 313 was executed: image

See the changed date also in the upper variable. It seems there is somewhere a link/pointer for the custom_fields object, but i cant find it.

Diving further in - line 232 in the same file - my code: image This screenshot is the state after line 232 got executed. This code is just copied from your code and i sadly dont understand why this is working for the other metadata parameters but not for the custom field.

Could you take a look at this?

My example Filter is the last one in the file rulesets.d/example.yml. Also i have a document in my paperless server with the string "eingegangen 30.01.2024" in it. This matches the regex.

jgillula commented 2 months ago

Apologies for the delay in responding. I have a hunch that the issue is the copy on line 285.

copy just does a shallow copy, and since all the objects in the metadata_in_filename_format are simple objects (like strings, or dates), those don't get changed. But since the custom fields are dicts themselves, they get changed. (More info at https://docs.python.org/3/library/copy.html)

I think the solution is to change that copy to a deepcopy. (And TBH, I think this was a bug you found and even if you weren't making your changes, it should probably have been a deepcopy from the beginning.)

I don't have a test setup handy to check myself; could you try making that change and see if it fixes your problem? (Also you'll probably need to change line 143)