qurator-spk / eynollah

Document Layout Analysis
Apache License 2.0
328 stars 26 forks source link

How to run the CLI #118

Closed emchateau closed 4 months ago

emchateau commented 9 months ago

The documentation suggests using the command line, but I don't understand how to run the program from the command line. Is it necessary to run OCR-D first? Isn't it possible to run the program directly in python? Many thanks for your help.

vahidrezanezhad commented 9 months ago

Dear @emchateau ,

Certainly, you can run it as a standalone tool without using OCR-D. First, download the models from the following link: https://qurator-data.de/eynollah/2022-04-05/ (you will need the 'models_eynollah_renamed_savedmodel' file). Then, install the tool using the command "pip install .". Finally, to execute it from the command line, use a command similar to the following:

eynollah -m 'path to the directory containing the models' -i 'path to the directory of the input document' -o 'path to write output XML file' -light

emchateau commented 9 months ago

Thank you very much for your prompt reply. I have followed all the steps indicated. It worked without any problem. I also tried to install with the Make file, without difficulties. But to make the eynollah command work, it seems to me that it has to be declared somewhere. Which program should it point to in order to run?

ferropasha commented 9 months ago

I am also trying to run some basic tests with cmd but not getting any luck contrary to emchateau here. I have installed the tool with apparent success because command eynollah --help works as expected and shows a list of available cli options. But a test command like this eynollah -i "d:\3\eynollah\kant_aufklaerung_1784_0020.tif" -o "d:\3\eynollah" -m "d:\3\eynollah\models_eynollah_renamed_savedmodel" -light with a page of Kant's book from your repository fails with the following errors (apparently the offending function is run_enhancement, though my attempt to run the command with -noae option didn't change anything). Can you advise on what's going wrong? Traceback (most recent call last): File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\Scripts\eynollah-script.py", line 33, in sys.exit(load_entry_point('eynollah', 'console_scripts', 'eynollah')()) File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1130, in call return self.main(args, kwargs) File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\lib\site-packages\click\core.py", line 760, in invoke return __callback(args, **kwargs) File "D:\3\eynollah\eynollah-main\qurator\eynollah\cli.py", line 201, in main eynollah.run() File "D:\3\eynollah\eynollah-main\qurator\eynollah\eynollah.py", line 2842, in run img_res, is_image_enhanced, num_col_classifier, num_column_is_classified = self.run_enhancement(self.light_version) File "D:\3\eynollah\eynollah-main\qurator\eynollah\eynollah.py", line 2525, in run_enhancement is_image_enhanced, img_org, img_res, num_col_classifier, num_column_is_classified, img_bin = self.resize_and_enhance_image_with_column_classifier(light_version) File "D:\3\eynollah\eynollah-main\qurator\eynollah\eynollah.py", line 542, in resize_and_enhance_image_with_columnclassifier , page_coord = self.early_page_for_num_of_column_classification(img_bin) File "D:\3\eynollah\eynollah-main\qurator\eynollah\eynollah.py", line 1019, in early_page_for_num_of_column_classification model_page, session_page = self.start_new_session_and_model(self.model_page_dir) File "D:\3\eynollah\eynollah-main\qurator\eynollah\eynollah.py", line 671, in start_new_session_and_model model = load_model(model_dir , compile=False,custom_objects = {"PatchEncoder": PatchEncoder, "Patches": Patches}) File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\saving\saving_api.py", line 262, in load_model return legacy_sm_saving_lib.load_model( File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\src\utils\traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py", line 45, in error_translator raise errors_impl.OpError(None, None, error_message, errors_impl.UNKNOWN) tensorflow.python.framework.errors_impl.OpError

cneud commented 9 months ago

Hi @ferropasha, thank you for your interest in Eynollah and for reporting the issue with the error log, which is helpful.

Unfortunately I see that you are using Windows, which I am afraid we do not currently support (and likely won't be able to in the near future).

There have been reports from users though in the past who managed to successfully install and run Eynollah through the Windows Subsystem for Linux (WSL2) with Ubuntu. Perhaps this could be an option for you?

cneud commented 9 months ago

Hi @emchateau: are you using Linux or Windows? On the command line (in a terminal), Eynollah should work as described in the readme or in the comment from @vahidrezanezhad above, once installed. Do you get any error message?

ferropasha commented 9 months ago

Windows Subsystem for Linux

Thanks for your prompt reply. Possibly I can manage it. As an alternative do you know of any colab with eynollah working? Right now I just have an assignment to run some very basic tests on the images representative of our scanned materials to see how good a result we can get with your pretrained models. Investing a lot of time in installation may be a bit premature at this point for us.

cneud commented 9 months ago

As an alternative do you know of any colab with eynollah working?

Not that I am aware of. If the amount of images to test is somewhat manageable, and they can be shared, perhaps we can process them for you. In that case please reach out to clemens dot neudecker at sbb dot spk-berlin.de

ferropasha commented 8 months ago

As an alternative do you know of any colab with eynollah working?

Not that I am aware of. If the amount of images to test is somewhat manageable, and they can be shared, perhaps we can process them for you. In that case please reach out to clemens dot neudecker at sbb dot spk-berlin.de

Thanks a lot for your kind offer. I will consult my superiors on it. Meanwhile I managed to setup a colab of my own and again attempted to run test analysis on the page from Kant's book I found in your repository here https://github.com/qurator-spk/eynollah/blob/main/tests/resources/kant_aufklaerung_1784_0020.tif My command looked like this eynollah -i "/content/sample_data/kant_aufklaerung_1784_0020.tif" -o "/content/sample_data" -m "/content/models_eynollah_renamed_savedmodel" -light It worked but the result of the analysis was a bit strange. Below is a visual representation I got with page-xml-draw utility. As you can see while all the text blocks had been identified entirely correctly there is also a strange blob-like object at the upper part of the page, which in the page.xml had been described as a picture (pc:ImageRegion). Is anything wrong with the command I run? out

vahidrezanezhad commented 8 months ago

As you can see while all the text blocks had been identified entirely correctly there is also a strange blob-like object at the upper part of the page, which in the page.xml had been described as a picture (pc:ImageRegion). Is anything wrong with the command I run?

Your command is functioning correctly. Nevertheless, it's essential to acknowledge that this problem arises from a malfunction in our models, and it has the potential to occur with any document. Because our highest priority is to accurately identify text regions. By the way, we appreciate your feedback, and we will work towards addressing these malfunctions in future updates :)

cneud commented 4 months ago

Since the question of how to run the CLI has been resolved, will be closing this to tidy up. Feel free to open another issue about performance at any time.