Closed emchateau closed 4 months ago
Dear @emchateau ,
Certainly, you can run it as a standalone tool without using OCR-D. First, download the models from the following link: https://qurator-data.de/eynollah/2022-04-05/ (you will need the 'models_eynollah_renamed_savedmodel' file). Then, install the tool using the command "pip install .". Finally, to execute it from the command line, use a command similar to the following:
eynollah -m 'path to the directory containing the models' -i 'path to the directory of the input document' -o 'path to write output XML file' -light
Thank you very much for your prompt reply. I have followed all the steps indicated. It worked without any problem. I also tried to install with the Make file, without difficulties. But to make the eynollah command work, it seems to me that it has to be declared somewhere. Which program should it point to in order to run?
I am also trying to run some basic tests with cmd but not getting any luck contrary to emchateau here. I have installed the tool with apparent success because command eynollah --help works as expected and shows a list of available cli options.
But a test command like this
eynollah -i "d:\3\eynollah\kant_aufklaerung_1784_0020.tif" -o "d:\3\eynollah" -m "d:\3\eynollah\models_eynollah_renamed_savedmodel" -light
with a page of Kant's book from your repository fails with the following errors (apparently the offending function is run_enhancement, though my attempt to run the command with -noae option didn't change anything). Can you advise on what's going wrong?
Traceback (most recent call last):
File "C:\Users\ecd4a\AppData\Local\Programs\Python\Python39\Scripts\eynollah-script.py", line 33, in
Hi @ferropasha, thank you for your interest in Eynollah and for reporting the issue with the error log, which is helpful.
Unfortunately I see that you are using Windows, which I am afraid we do not currently support (and likely won't be able to in the near future).
There have been reports from users though in the past who managed to successfully install and run Eynollah through the Windows Subsystem for Linux (WSL2) with Ubuntu. Perhaps this could be an option for you?
Hi @emchateau: are you using Linux or Windows? On the command line (in a terminal), Eynollah should work as described in the readme or in the comment from @vahidrezanezhad above, once installed. Do you get any error message?
Windows Subsystem for Linux
Thanks for your prompt reply. Possibly I can manage it. As an alternative do you know of any colab with eynollah working? Right now I just have an assignment to run some very basic tests on the images representative of our scanned materials to see how good a result we can get with your pretrained models. Investing a lot of time in installation may be a bit premature at this point for us.
As an alternative do you know of any colab with eynollah working?
Not that I am aware of. If the amount of images to test is somewhat manageable, and they can be shared, perhaps we can process them for you. In that case please reach out to clemens dot neudecker at sbb dot spk-berlin.de
As an alternative do you know of any colab with eynollah working?
Not that I am aware of. If the amount of images to test is somewhat manageable, and they can be shared, perhaps we can process them for you. In that case please reach out to clemens dot neudecker at sbb dot spk-berlin.de
Thanks a lot for your kind offer. I will consult my superiors on it. Meanwhile I managed to setup a colab of my own and again attempted to run test analysis on the page from Kant's book I found in your repository here https://github.com/qurator-spk/eynollah/blob/main/tests/resources/kant_aufklaerung_1784_0020.tif
My command looked like this
eynollah -i "/content/sample_data/kant_aufklaerung_1784_0020.tif" -o "/content/sample_data" -m "/content/models_eynollah_renamed_savedmodel" -light
It worked but the result of the analysis was a bit strange. Below is a visual representation I got with page-xml-draw utility. As you can see while all the text blocks had been identified entirely correctly there is also a strange blob-like object at the upper part of the page, which in the page.xml had been described as a picture (pc:ImageRegion). Is anything wrong with the command I run?
As you can see while all the text blocks had been identified entirely correctly there is also a strange blob-like object at the upper part of the page, which in the page.xml had been described as a picture (pc:ImageRegion). Is anything wrong with the command I run?
Your command is functioning correctly. Nevertheless, it's essential to acknowledge that this problem arises from a malfunction in our models, and it has the potential to occur with any document. Because our highest priority is to accurately identify text regions. By the way, we appreciate your feedback, and we will work towards addressing these malfunctions in future updates :)
Since the question of how to run the CLI has been resolved, will be closing this to tidy up. Feel free to open another issue about performance at any time.
The documentation suggests using the command line, but I don't understand how to run the program from the command line. Is it necessary to run OCR-D first? Isn't it possible to run the program directly in python? Many thanks for your help.