keensoft / alfresco-simple-ocr

Simple OCR action for Alfresco
Other
44 stars 30 forks source link

How to set Dynamically set ocr.extra.commands option based on some logic? #58

Open DEEPAK-KESWANI opened 6 years ago

DEEPAK-KESWANI commented 6 years ago

Hi,

I want to enable auto rotation only on 1st version and disable it for rest of the version.

I'm using OCRMyPDF tool with following property in alfresco-global.properties file.

# OCRmyPDF ocr.extra.commands=--verbose 1 --force-ocr -l eng --output-type pdf

The above setting should apply for all versions of documents except 1.0.

I'm trying to enable auto rotation for 1.0 version by adding below code in properties Map but it's not considering. Any help is appreciated.

For 1.0 version, the ocr.extra.commands property should be: ocr.extra.commands=--verbose 1 --force-ocr -l eng --output-type pdf --rotate-pages --rotate-pages-threshold 1

Thanks.

### OCRTransformWorker.java
public final void transform(ContentReader reader, ContentWriter writer, TransformationOptions options)
            throws Exception {

        File sourceFile = null;
        File targetFile = null;
        try {

            String sourceMimetype = getMimetype(reader);
            String sourceExtension = mimetypeService.getExtension(sourceMimetype);
            sourceFile = TempFileProvider.createTempFile(getClass().getSimpleName() + "_source_",
                    "." + sourceExtension);
            reader.getContent(sourceFile);

            String path = sourceFile.getAbsolutePath();
            String targetPath = path.substring(0, path.toLowerCase().lastIndexOf(".")) + "_ocr.pdf";

            Map<String, String> properties = new HashMap<String, String>(1);

            properties.put(VAR_SOURCE, sourceFile.getAbsolutePath());
            properties.put(VAR_TARGET, targetPath);

/**
* Custom Code STARTS for setting Auto Rotation
*/
 if(options != null) // I'm passing options as non-null from OCRExtractAction.java when version is 1.0 and null when version >  1.0 
{
  properties.put("ocr.extra.commands", "--verbose 1 --force-ocr -l eng --output-type pdf --rotate-pages --rotate-pages-threshold 1");
}  

/**
* Custom Code ENDS for setting Auto Rotation
*/

            RuntimeExec.ExecutionResult result = obtainExecuter(properties);

            if (verbose) {
                logger.info("EXIT VALUE: " + result.getExitValue());
                logger.info("STDOUT: " + result.getStdOut());
                logger.info("STDERR: " + result.getStdErr());
            }

            if (result.getExitValue() == 143) {
                logger.warn(result.getStdErr());
            } else if (result.getExitValue() != 0 && result.getStdErr() != null && result.getStdErr().length() > 0) {
                throw new ContentIOException("Failed to perform OCR transformation: \n" + result);
            }

            targetFile = new File(targetPath);
            writer.putContent(targetFile);

        } catch (Throwable t) {
            throw new RuntimeException(t);
        } finally {

        }
angelborroy-ks commented 6 years ago

Probably it should be better to declare a new OCR Transformer with your extra options similar to

https://github.com/keensoft/alfresco-simple-ocr/blob/master/simple-ocr-repo/src/main/resources/alfresco/module/simple-ocr-repo/context/service-context.xml#L16

and then inject this new bean to your TransformWorker at

https://github.com/keensoft/alfresco-simple-ocr/blob/master/simple-ocr-repo/src/main/resources/alfresco/module/simple-ocr-repo/context/service-context.xml#L6

Then just decide which to use in your Java code.

DEEPAK-KESWANI commented 6 years ago

Thanks a lot for your quick response and inputs. It worked for me. :)