Closed commonssibi closed 8 years ago
Delete libevdocument1 , libevview1 from that sudo apt-get install line and try again.
Check the latest INSTALL doc for the details. Code has some changes. Pull again and install the latest one.
Reopening as we are not able to process djvu files. Bengali Wikisource is full of such files.
Do you have ddjvu installed?
Run
sudo apt-get install ddjvu
Run ddjvu in command line as
ddjvu filename.djvu filename.pdf
and share the results. On 28 Dec 2015 16:15, "ravidreams" notifications@github.com wrote:
Reopening as we are not able to process djvu files. Bengali Wikisource is full of such files.
— Reply to this email directly or view it on GitHub https://github.com/tshrinivasan/OCR4wikisource/issues/3#issuecomment-167596691 .
ravi@ravi-desktop:~$ sudo apt-get install ddjvu
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package ddjvu
Run
sudo apt-get update sudo apt-get install ddjvu
Share the results On 29 Dec 2015 04:11, "ravidreams" notifications@github.com wrote:
ravi@ravi-desktop:~$ sudo apt-get install ddjvu Reading package lists... Done Building dependency tree
Reading state information... Done E: Unable to locate package ddjvu
— Reply to this email directly or view it on GitHub https://github.com/tshrinivasan/OCR4wikisource/issues/3#issuecomment-167715406 .
Same error.
This might help: $ sudo aptitude install djvulibre-bin The following NEW packages will be installed: djvulibre-bin libgraphicsmagick++11{a} libgraphicsmagick3{a} pdf2djvu{a}
Here are the other recommended packages: libdjvulibre21 okular-extra-backends evince libevdocument3 libevview3 libtiff-tools
Shrini,
When I run
ddjvu filename.djvu filename.pdf
djvu file is converted to pdf file. But, it is only an image file and not a pdf file.
file test.pdf
gives following info:
test.pdf: Netpbm PPM "rawbits" image data, size = 1654 x 2338
Om, I tried all the steps you suggested and also installed everything related to djvu found at
http://packages.ubuntu.com/search?keywords=djvu&searchon=names&suite=trusty§ion=all
No luck yet. Try the example bengali file given above. But, it will give encoding error if you pass the djvu issue :)
We have to find a way to convert djvu files to PDF files in commandline in linux.
I tried as
ddjvu -format=pdf test.djvu test.pdf which gives good pdf file.
source : http://en.proft.me/2014/02/12/how-convert-djvu-pdf-linux/
I tried djvu2pdf from http://0x2a.at/s/projects/djvu2pdf which also works fine.
Can anyone try these two with various djvu files and share the results?
ddjvu -format=pdf test.djvu test.pdf gives good pdf conversion.
ravi@ravi-Satellite-L50-B ~/Downloads $ file test.pdf test.pdf: PDF document, version 1.2
But, after running do_OCR.py again stuck at https://github.com/tshrinivasan/OCR4wikisource/issues/10 :(
Actually, just now realized pdf conversion was not successful:
Downloading the file আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu
[################################] 3176/3176 - 00:00:04 Found a djvu file. Converting to PDF file.
ddjvu: [1-11711] Failed to open '%E0%A6%86%E0%A6%B0%E0%A7%8B%E0%A6%97%E0%A7%8D%E0%A6%AF-%E0%A6%B0%E0%A6%AC%E0%A7%80%E0%A6%A8%E0%A7%8D%E0%A6%A6%E0%A7%8D%E0%A6%B0%E0%A6%A8%E0%A6%BE%E0%A6%A5%E0%A6%A0%E0%A6%BE%E0%A6%95%E0%A7%81%E0%A6%B0.djvu%E0%A6%86%E0%A6%B0%E0%A7%8B%E0%A6%97%E0%A7%8D%E0%A6%AF-%E0%A6%B0%E0%A6%AC%E0%A7%80%E0%A6%A8%E0%A7%8D%E0%A6%A6%E0%A7%8D%E0%A6%B0%E0%A6%A8%E0%A6%BE%E0%A6%A5%E0%A6%A0%E0%A6%BE%E0%A6%95%E0%A7%81%E0%A6%B0.pdf': No such file or directory. ddjvu: 'ByteStream.cpp:699' ddjvu: Cannot open djvu document 'আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvuআরোগ্য-রবীন্দ্রনাথ_ঠাকুর.pdf'. Aligining the Pages of PDF file.
error: cannot open আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.pdf error: cannot load document 'আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.pdf' uncaught exception: cannot load document 'আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.pdf' Spliting the PDF into single pages.
Error: Unable to find file. Error: Failed to open PDF file: currentfile.pdf Done. Input errors, so no output created. Joining the PDF files ...
Creating a folder in Google Drive to upload files
Folder Name : OCR-আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu-temp-2016-01-04-21-11-18
id: 0B9amQol3ByIjWDBja0FWcWpBclE drive view: https://drive.google.com/drive/folders/0B9amQol3ByIjWDBja0FWcWpBclE folder view: https://docs.google.com/folderview?id=0B9amQol3ByIjWDBja0FWcWpBclE&usp=drivesdk
Moving all temp files to OCR-আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu-temp-2016-01-04-21-11-18
mv: target ‘OCR-আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu-temp-2016-01-04-21-11-18’ is not a directory
Deleting the Temp folder in Google Drive OCR-আরোগ্য-রবীন্দ্রনাথ_ঠাকুর.djvu-temp-2016-01-04-21-11-18
Fixed this issue already.
Check with the latest version.
Closing this. Reopen if you get same issue.