Read texts in images with computer vision provides different results.

ghost commented 6 years ago

Hello,

i tried to demonstrate Computer vision "Read texts in images" with bot-framework. but i am getting different results as the results from the website: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/

i dont know if i made something wrong with my code:

can any body help me please: this is an example for the difference between the both functionalities : the left site is the result from the website and the right site is my result which is very bad.

ffff

'use strict';
const builder = require('botbuilder');
const restify = require('restify');
const utils = require('./utils.js');
const request = require('request');

const subscriptionKey = '02*****';
const uriBase = 'https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/ocr';
//--others
//-
// Create chat connector for communicating with the Bot Framework Service
const connector = new builder.ChatConnector({
    appId: process.env.MICROSOFT_APP_ID,
    appPassword: process.env.MICROSOFT_APP_PASSWORD
});
// Setup Restify Server
const server = restify.createServer();
server.listen(process.env.port || process.env.PORT || 3978, () => {
    console.log("${ server.name } listening to ${ server.url }");
});
// Listen for messages from users
server.post('/api/messages', connector.listen());
const bot = new builder.UniversalBot(connector);
// default dialog
bot.dialog('/', function (session) {
    if (utils.hasImageAttachment(session)) {
        //--others
        var stream = utils.getImageStreamFromMessage(session.message);
        const params = {
            'language': 'unk',
            'detectOrientation': 'true',
        };
        const options = {
            uri: uriBase,
            qs: params,
            body: stream,

            headers: {
                'Content-Type': 'application/octet-stream',
                'Ocp-Apim-Subscription-Key': subscriptionKey
            }
        };
        request.post(options, (error, response, body) => {
            if (error) {
                console.log('Error: ', error);
                return;
            }
            const obj = JSON.parse(body);

            //session.send(body);
            //------------ get the texts from json as string
            if (obj.regions == "") {

                session.send('OOOOPS I CANNOT READ ANYTHING IN THISE IMAGE :(');
            } else {
                let buf = ''
                if (obj && obj.regions) {
                    obj.regions.forEach((a, b, c) => {
                        if (a && a.lines) {
                            a.lines.forEach((p, q, r) => {
                                if (p && p.words) {
                                    p.words.forEach((x, y, z) => {

                                        buf += "${ x.text }"
                                    })
                                }
                            })
                        }
                    })
                }
                session.send(buf);
            }
        });
        //--others

        // temporary code to see if we've actually received the image or not
        // this will be used later on for sending the image stream to the Prediction API of Custom Vision

    } else {
        session.send('I did not receive any image');
    }
});

ghost commented 6 years ago

@v-kydela can you please help me with this issue. I would be very thankful :)

v-kydela commented 6 years ago

@Schindar May I ask why you closed this issue?

ghost commented 6 years ago

i dont know i thought no body will answer me. do you have any solution please ? i am just new with this topic (node.js)(microsoft azure)

v-kydela commented 6 years ago

I think it would be a good idea to leave issues open if they remain unresolved and you could still use help with them. You can also consider asking questions on Stack Overflow.

About your question, the first thing you need to understand is that Computer Vision provides three types of OCR (optical character recognition):

MSOCR: Traditional, scanned document-centric printed text OCR. Supports 26 languages. Synchronous, typically less than 100 ms per image. Called with https://westus.api.cognitive.microsoft.com/vision/v2.0/ocr. Only requires a POST.
OneOCR: State-of-the-art printed text OCR, esp. for “OCR in the wild.” English tuned. Asynchronous, typically a few seconds per image. Called with https://westus.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=Printed. Requires both a POST and a GET.
HDW OCR: Cursive handwritten text OCR. English tuned. Asynchronous, typically a few seconds per image. Called with https://westus.api.cognitive.microsoft.com/vision/v2.0/recognizeText?mode=Handwritten. Requires both a POST and a GET.

As you might have guessed by now, you are using MSOCR and the demo is using OneOCR. MSOCR is a sort of quick and dirty option for when you have images that are easy to read. If you want to accurately read a more difficult image like the sample you're using, you'll want OneOCR. Note that OneOCR is more complicated because it is asynchronous, i.e. you'll need to use a POST to start the operation and then use a GET to retrieve the results after a few seconds.

You can have a look at the documentation here: https://docs.microsoft.com/en-us/azure/cognitive-services/Computer-vision/

ghost commented 6 years ago

@v-kydela so that means, in order to get the same result of the demo. i should put the oneOCR link into the app.js file . can you kindly make this code for me please ( i.e. you'll need to use a POST to start the operation and then use a GET to retrieve the results after a few seconds.), i am a beginner and in rash,i bought the prodoct and i need it like the demo. for my job. it will be very helpful, thanks in advance

v-kydela commented 6 years ago

@Schindar It's not a simple matter of replacing the URI if that's what you're asking. Did you read the documentation or have a look at the API Reference I linked you to?

ghost commented 6 years ago

yes i did, but i dont know how to use it, or how to change my code so that i can use a POST to start the operation and then use a GET to retrieve the results after a few seconds. :( all what i need is the same result of the demo. that why i bought the product

v-kydela commented 6 years ago

When you POST to recognizeText it will return a URL in the operation-location field. That's the URL you'll use for your GET. Since you won't know how long the operation will take, you have to keep trying until you see results. setInterval can be used for this. Here's a bot I managed to scrape together, based on the Image Caption sample. Just make sure you use a .env file like before.

/*-----------------------------------------------------------------------------
A Recognize Text bot for the Microsoft Bot Framework. 
-----------------------------------------------------------------------------*/

// This loads the environment variables from the .env file
require('dotenv-extended').load();

var builder = require('botbuilder'),
    request = require('request').defaults({ encoding: null }),
    needle = require('needle'),
    restify = require('restify'),
    validUrl = require('valid-url');

// You will need to define MICROSOFT_VISION_API_ENDPOINT and 
// MICROSOFT_VISION_API_KEY in your .env file

var apiUrl = process.env.MICROSOFT_VISION_API_ENDPOINT + '/recognizeText';

//=========================================================
// Bot Setup
//=========================================================

// Setup Restify Server
var server = restify.createServer();
server.listen(process.env.port || process.env.PORT || 3978, function () {
    console.log('%s listening to %s', server.name, server.url);
});

// Create chat bot
var connector = new builder.ChatConnector({
    appId: process.env.MICROSOFT_APP_ID,
    appPassword: process.env.MICROSOFT_APP_PASSWORD
});

server.post('/api/messages', connector.listen());

// Bot Storage: Here we register the state storage for your bot. 
// Default store: volatile in-memory store - Only for prototyping!
// We provide adapters for Azure Table, CosmosDb, SQL Azure, or you can implement your own!
// For samples and documentation, see: https://github.com/Microsoft/BotBuilder-Azure
var inMemoryStorage = new builder.MemoryBotStorage();

const params = {
    mode: "Printed"
};

const ocrHeaders = {
    "Ocp-Apim-Subscription-Key": process.env.MICROSOFT_VISION_API_KEY,
    "content-type": "application/json",
};

// Gets the caption by checking the type of the image (stream vs URL) and calling the appropriate caption service method.
var bot = new builder.UniversalBot(connector, function (session) {
    if (hasImageAttachment(session)) {
        // If the user attaches an image
        var stream = getImageStreamFromMessage(session.message);

        const postOptions = {
            url: apiUrl,
            qs: params,
            encoding: 'binary',
            json: true,
            headers: {
                'Ocp-Apim-Subscription-Key': process.env.MICROSOFT_VISION_API_KEY,
                'content-type': 'application/octet-stream',
            },
        };

        stream.pipe(postRecognizeText(session, postOptions));
    } else {
        // If the user types something
        var imageUrl = parseAnchorTag(session.message.text) || (validUrl.isUri(session.message.text) ? session.message.text : null);

        if (imageUrl) {
            // If the user types a valid image URL
            const postOptions = {
                url: apiUrl,
                qs: params,
                json: { 'url': imageUrl },
                headers: ocrHeaders,
            };

            postRecognizeText(session, postOptions);
        } else {
            session.send('Did you upload an image? I\'m more of a visual person. Try sending me an image or an image URL');
        }
    }
}).set('storage', inMemoryStorage); // Register in memory storage

/**
 * This contains all the OCR logic
 * @param {Session} session 
 * @param {CoreOptions} postOptions 
 */
function postRecognizeText(session, postOptions) {
    session.send("Working on it...");

    // This is the first API call
    return request.post(postOptions, (error, response, body) => {
        if (error) {
            session.send("Error from POST: ", error);
            return;
        }

        const opLocation = response.headers["operation-location"];

        let getOptions = {
            uri: opLocation,
            headers: ocrHeaders
        };

        let count = 0;

        // The only way to know if the operation is finished is to try retrieving the data.
        // So we're going to try a maximum of 20 times over 10 seconds.
        let interval = setInterval(() => {
            if (++count >= 20) {
                clearInterval(interval);
            }

            // This is the second API call
            request.get(getOptions, (error, response, body) => {
                if (error) {
                    session.send(`Error from GET ${count}: ${error}`);
                    return;
                }

                let bodyObject = JSON.parse(body);

                if (bodyObject.status == "Running") {
                    if (count >= 20)
                    {
                        session.send("The operation timed out.");
                    }
                } else {
                    clearInterval(interval);

                    if (bodyObject.status == "Succeeded")
                    {
                        let result = "";

                        bodyObject.recognitionResult.lines.forEach(line => {
                            result += `${line.text}\n`;
                        });

                        session.send(result);
                    }
                    else
                    {
                        session.send(`Status: ${bodyObject.status}`);
                    }
                }
            });
        }, 500);
    });
}

//=========================================================
// Utilities
//=========================================================
function hasImageAttachment(session) {
    return session.message.attachments.length > 0 &&
        session.message.attachments[0].contentType.indexOf('image') !== -1;
}

function getImageStreamFromMessage(message) {
    var headers = {};
    var attachment = message.attachments[0];
    if (checkRequiresToken(message)) {
        // The Skype attachment URLs are secured by JwtToken,
        // you should set the JwtToken of your bot as the authorization header for the GET request your bot initiates to fetch the image.
        // https://github.com/Microsoft/BotBuilder/issues/662
        connector.getAccessToken(function (error, token) {
            var tok = token;
            headers['Authorization'] = 'Bearer ' + token;
            headers['Content-Type'] = 'application/octet-stream';

            return needle.get(attachment.contentUrl, { headers: headers });
        });
    }

    headers['Content-Type'] = attachment.contentType;
    return needle.get(attachment.contentUrl, { headers: headers });
}

function checkRequiresToken(message) {
    return message.source === 'skype' || message.source === 'msteams';
}

/**
 * Gets the href value in an anchor element.
 * Skype transforms raw urls to html. Here we extract the href value from the url
 * @param {string} input Anchor Tag
 * @return {string} Url matched or null
 */
function parseAnchorTag(input) {
    var match = input.match('^<a href=\"([^\"]*)\">[^<]*</a>$');
    if (match && match[1]) {
        return match[1];
    }

    return null;
}

ghost commented 6 years ago

thanks that is working, i can close the issue now

ghost commented 6 years ago

@v-kydela sorry that i opend the issue again, i have a question: i am trying to add some welcome text with an image from my desktop like this:

var bot = new builder.UniversalBot(connector, function (session) {

// greeting message with a pic

bot.on('conversationUpdate', function (message) {
    if (message.membersAdded) {
        message.membersAdded.forEach(function (identity) {
            if (identity.id === message.address.bot.id) {
                var  filePath =  "C:\Users\z003sahb\Desktop\builder\ex.png";
              // var uri = new System.Uri(filePath);
                var reply = new builder.Message()
                    .address(message.address)
                    .addAttachment({
                        contentUrl:'C:\Users\admin\Desktop\builder',
                        contentType: 'image/png',
                        name: 'ex.png'
                    })
                    .text('Hi, please send a screenshot for the error like this : ');

                bot.send(reply);
            }
        });
    }
}
);

//

    if (hasImageAttachment(session)) { ..... the rest is like above

but it does not work can you please correct my code, thank you soo much

v-kydela commented 6 years ago

Hi @Schindar. There are a few things to mention about your code.

Strings in JavaScript use escape characters. You need to make sure you escape your backslashes.
Remember that your Node,js code is meant to run on a remote server. It doesn't make a lot of sense for it to access a file on your local hard drive.
Your contentUrl needs to be a full path that includes the filename and not just the directory.

You can learn more about sending attachments here and even more here. Also consider asking these types of questions on Stack Overflow. GitHub is mostly meant for issues that relate to this repository specifically.

microsoft / BotBuilder-Samples

Read texts in images with computer vision provides different results. #287