Open dbffm opened 3 years ago
Update: I did another test with
- folders: /Documents/Scanner/action
subfolders: true
filters:
- extension: pdf
- filecontent: '(?P<all>.*)'
actions:
- echo: "{filecontent.all}"
Now I see in the log file why the rule has not been executed: There are two whitespaces between each word. When I manually copy&paste the text out of my pdf reader (foxit reader on windows) there is only one whitespace between the words.
So changing the regex to
- filecontent:
- \bBeispiel\s+\bLebensversicherung\s+\bAG
is the solution for me.
So I still have two open questions: Why is textract reading two whitespaces compared to my pdf reader? How to add the Regex flags to the config.yaml file?
Hi all,
I dont know if it is a bug or if I am using it wrong. I am new to regular expressions but did a lot of research and did some excercises on https://regex101.com/
I have a searchable pdf file (created with synoOCR) and I am searching for the String "Beispiel Lebensversicherung AG"
my config.yaml looks like this:
Unfortunately the rule is not working. What I found out that is working when I only search for one word of the string. For example:
is working ; but when combining it with the two other words the rule is not executed. I tested the expression on regex101 --> https://regex101.com/r/7bFIu8/1
Do I have to change the filecontent line in the config file? I also concern about: How to add the regex options (flags) in the yaml file?