Describe the change
There is a desire to also have the full extracted OCR text as a string for searching side by side with the text extracted as an array. This PR adds in that functionality under the field "string_text" in order to prevent issues with any existing parsing rules. Also updated the scan_ocr test case in order to reflect this change.
Additionally, fixed the formatting of several of the scanner files in order to pass the code stylization check.
Describe testing procedures
Tested locally via the test cases in scan_ocr, which were slightly modified in order to reflect changes in text fields.
Sample output
{'elapsed': 19.439179,
'flags': [],
+ 'full_text': b'Lorem Ipsum Lorem ipsum dolor sit amet, consectetur adipisci'
+ b'ng elit. Cras lobortis sem dui. Morbi at magna quis ligula f'
+ b'aucibusconsectetur feugiat at purus. Sed nec lorem nibh. Nam'
+ b' vel libero odio. Vivamus tempus non enim egestas pretium.Ve'
+ b'stibulum turpis arcu, maximus nec libero quis, imperdiet sus'
+ b'cipit purus. Vestibulum blandit quis lacus nonsollicitudin. '
+ b'Nullam non convallis dui, et aliquet risus. Sed accumsan ull'
+ b'amcorper vehicula. Proin non urna facilisis,condimentum eros'
+ b' quis, suscipit purus. Morbi euismod imperdiet neque ferment'
+ b'um dictum. Integer aliquam, erat sitamet fringilla tempus, m'
+ b'auris ligula blandit sapien, et varius sem mauris eu diam. S'
+ b'ed fringilla neque est, in laoreetfelis tristique in. Donec '
+ b'luctus velit a posuere posuere. Suspendisse sodales pellente'
+ b'sque quam.',
'text': [b'Lorem',
b'Ipsum',
b'Lorem',
b'ipsum',
b'dolor',
b'sit',
b'amet,',
b'consectetur',
b'adipiscing',
b'elit.',
b'Cras',
b'lobortis',
b'sem',
b'dui.',
b'Morbi',
b'at',
b'magna',
b'quis',
b'ligula',
b'faucibus',
b'consectetur',
b'feugiat',
b'at',
b'purus.',
b'Sed',
b'nec',
b'lorem',
b'nibh.',
b'Nam',
b'vel',
b'libero',
b'odio.',
b'Vivamus',
b'tempus',
b'non',
b'enim',
b'egestas',
b'pretium.',
b'Vestibulum',
b'turpis',
b'arcu,',
b'maximus',
b'nec',
b'libero',
b'quis,',
b'imperdiet',
b'suscipit',
b'purus.',
b'Vestibulum',
b'blandit',
b'quis',
b'lacus',
b'non',
b'sollicitudin.',
b'Nullam',
b'non',
b'convallis',
b'dui,',
b'et',
b'aliquet',
b'risus.',
b'Sed',
b'accumsan',
b'ullamcorper',
b'vehicula.',
b'Proin',
b'non',
b'urna',
b'facilisis,',
b'condimentum',
b'eros',
b'quis,',
b'suscipit',
b'purus.',
b'Morbi',
b'euismod',
b'imperdiet',
b'neque',
b'fermentum',
b'dictum.',
b'Integer',
b'aliquam,',
b'erat',
b'sit',
b'amet',
b'fringilla',
b'tempus,',
b'mauris',
b'ligula',
b'blandit',
b'sapien,',
b'et',
b'varius',
b'sem',
b'mauris',
b'eu',
b'diam.',
b'Sed',
b'fringilla',
b'neque',
b'est,',
b'in',
b'laoreet',
b'felis',
b'tristique',
b'in.',
b'Donec',
b'luctus',
b'velit',
b'a',
b'posuere',
b'posuere.',
b'Suspendisse',
b'sodales',
b'pellentesque',
b'quam.']}
Checklist
[X] My code follows the style guidelines of this project
[X] I have performed a self-review of and tested my code
[X] I have commented my code, particularly in hard-to-understand areas
[X] I have made corresponding changes to the documentation
Describe the change There is a desire to also have the full extracted OCR text as a string for searching side by side with the text extracted as an array. This PR adds in that functionality under the field "string_text" in order to prevent issues with any existing parsing rules. Also updated the scan_ocr test case in order to reflect this change.
Additionally, fixed the formatting of several of the scanner files in order to pass the code stylization check.
Describe testing procedures Tested locally via the test cases in scan_ocr, which were slightly modified in order to reflect changes in text fields.
Sample output
Checklist