tshrinivasan / OCR4wikisource

OCR for WikiSource using Google Drive OCR
GNU General Public License v2.0
33 stars 24 forks source link

Packages not available - libevdocument1 , libevview1 #3

Closed commonssibi closed 8 years ago

commonssibi commented 8 years ago

screenshot from 2015-12-24 09 44 58

tshrinivasan commented 8 years ago

Delete libevdocument1 , libevview1 from that sudo apt-get install line and try again.

Check the latest INSTALL doc for the details. Code has some changes. Pull again and install the latest one.

ravidreams commented 8 years ago

Reopening as we are not able to process djvu files. Bengali Wikisource is full of such files.

tshrinivasan commented 8 years ago

Do you have ddjvu installed?

Run

sudo apt-get install ddjvu

Run ddjvu in command line as

ddjvu filename.djvu filename.pdf

and share the results. On 28 Dec 2015 16:15, "ravidreams" notifications@github.com wrote:

Reopening as we are not able to process djvu files. Bengali Wikisource is full of such files.

— Reply to this email directly or view it on GitHub https://github.com/tshrinivasan/OCR4wikisource/issues/3#issuecomment-167596691 .

ravidreams commented 8 years ago

ravi@ravi-desktop:~$ sudo apt-get install ddjvu Reading package lists... Done Building dependency tree
Reading state information... Done E: Unable to locate package ddjvu

tshrinivasan commented 8 years ago

Run

sudo apt-get update sudo apt-get install ddjvu

Share the results On 29 Dec 2015 04:11, "ravidreams" notifications@github.com wrote:

ravi@ravi-desktop:~$ sudo apt-get install ddjvu Reading package lists... Done Building dependency tree

Reading state information... Done E: Unable to locate package ddjvu

— Reply to this email directly or view it on GitHub https://github.com/tshrinivasan/OCR4wikisource/issues/3#issuecomment-167715406 .

ravidreams commented 8 years ago

Same error.

omshivaprakash commented 8 years ago

This might help: $ sudo aptitude install djvulibre-bin The following NEW packages will be installed: djvulibre-bin libgraphicsmagick++11{a} libgraphicsmagick3{a} pdf2djvu{a}

Here are the other recommended packages: libdjvulibre21 okular-extra-backends evince libevdocument3 libevview3 libtiff-tools

ravidreams commented 8 years ago

Shrini,

When I run

ddjvu filename.djvu filename.pdf

djvu file is converted to pdf file. But, it is only an image file and not a pdf file.

file test.pdf

gives following info:

test.pdf: Netpbm PPM "rawbits" image data, size = 1654 x 2338

Example file - https://upload.wikimedia.org/wikipedia/commons/d/de/%E0%A6%86%E0%A6%B0%E0%A7%8B%E0%A6%97%E0%A7%8D%E0%A6%AF-%E0%A6%B0%E0%A6%AC%E0%A7%80%E0%A6%A8%E0%A7%8D%E0%A6%A6%E0%A7%8D%E0%A6%B0%E0%A6%A8%E0%A6%BE%E0%A6%A5_%E0%A6%A0%E0%A6%BE%E0%A6%95%E0%A7%81%E0%A6%B0.djvu

ravidreams commented 8 years ago

Om, I tried all the steps you suggested and also installed everything related to djvu found at

http://packages.ubuntu.com/search?keywords=djvu&searchon=names&suite=trusty&section=all

No luck yet. Try the example bengali file given above. But, it will give encoding error if you pass the djvu issue :)

tshrinivasan commented 8 years ago

We have to find a way to convert djvu files to PDF files in commandline in linux.

I tried as

ddjvu -format=pdf test.djvu test.pdf which gives good pdf file.

source : http://en.proft.me/2014/02/12/how-convert-djvu-pdf-linux/

I tried djvu2pdf from http://0x2a.at/s/projects/djvu2pdf which also works fine.

Can anyone try these two with various djvu files and share the results?

ravidreams commented 8 years ago

ddjvu -format=pdf test.djvu test.pdf gives good pdf conversion.

ravi@ravi-Satellite-L50-B ~/Downloads $ file test.pdf test.pdf: PDF document, version 1.2

But, after running do_OCR.py again stuck at https://github.com/tshrinivasan/OCR4wikisource/issues/10 :(

ravidreams commented 8 years ago

Actually, just now realized pdf conversion was not successful:

Downloading the file আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu

[################################] 3176/3176 - 00:00:04 Found a djvu file. Converting to PDF file.

ddjvu: [1-11711] Failed to open '%E0%A6%86%E0%A6%B0%E0%A7%8B%E0%A6%97%E0%A7%8D%E0%A6%AF-%E0%A6%B0%E0%A6%AC%E0%A7%80%E0%A6%A8%E0%A7%8D%E0%A6%A6%E0%A7%8D%E0%A6%B0%E0%A6%A8%E0%A6%BE%E0%A6%A5%E0%A6%A0%E0%A6%BE%E0%A6%95%E0%A7%81%E0%A6%B0.djvu%E0%A6%86%E0%A6%B0%E0%A7%8B%E0%A6%97%E0%A7%8D%E0%A6%AF-%E0%A6%B0%E0%A6%AC%E0%A7%80%E0%A6%A8%E0%A7%8D%E0%A6%A6%E0%A7%8D%E0%A6%B0%E0%A6%A8%E0%A6%BE%E0%A6%A5%E0%A6%A0%E0%A6%BE%E0%A6%95%E0%A7%81%E0%A6%B0.pdf': No such file or directory. ddjvu: 'ByteStream.cpp:699' ddjvu: Cannot open djvu document 'আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvuআরোগ্য-রবীন্দ্রনাথ_ঠাকুর.pdf'. Aligining the Pages of PDF file.

error: cannot open আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.pdf error: cannot load document 'আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.pdf' uncaught exception: cannot load document 'আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.pdf' Spliting the PDF into single pages.

Error: Unable to find file. Error: Failed to open PDF file: currentfile.pdf Done. Input errors, so no output created. Joining the PDF files ...

Creating a folder in Google Drive to upload files

Folder Name : OCR-আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu-temp-2016-01-04-21-11-18

id: 0B9amQol3ByIjWDBja0FWcWpBclE drive view: https://drive.google.com/drive/folders/0B9amQol3ByIjWDBja0FWcWpBclE folder view: https://docs.google.com/folderview?id=0B9amQol3ByIjWDBja0FWcWpBclE&usp=drivesdk

Moving all temp files to OCR-আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu-temp-2016-01-04-21-11-18

mv: target ‘OCR-আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu-temp-2016-01-04-21-11-18’ is not a directory

Deleting the Temp folder in Google Drive OCR-আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu-temp-2016-01-04-21-11-18

tshrinivasan commented 8 years ago

Fixed this issue already.

Check with the latest version.

Closing this. Reopen if you get same issue.