sararob / ml-talk-demos

Code for the demos from my Google Next '17 talk: https://youtu.be/w1xNTLH1zlA and I/O '17 talk: https://www.youtube.com/watch?v=ETeeSYMGZn0
Apache License 2.0
299 stars 124 forks source link

Questions about the Vision API #1

Open AndroidDeveloperLB opened 7 years ago

AndroidDeveloperLB commented 7 years ago

About this part in the lecture: https://youtu.be/w1xNTLH1zlA?t=462 maxthonsnap20170503143320

I have a few questions:

  1. About "crop hints", what is exactly the expected result of suggesting to crop the image? Does it try to crop faces of people? whole bodies of them? What's the logic of it?
  2. About any of the APIs that are mentioned there ( I'm curious more about "crop hints" and "web annotations") , is there any Android example project of using them ? This repo seems to have it for Python, probably in "vision-speech-nl-translate" sample.
  3. I tried to look on prices for anything related to the Vision API, but I can't find them. My guess is that it's not free or free up to a certain limit. Can anyone please show me explanation of this? And suppose I do want to try it, where to start?
AndroidDeveloperLB commented 7 years ago

BTW, the OCR feature isn't perfect at all. I tried it on one of your own images: https://cloud.google.com/images/products/artwork/insight-text.png Found from here: https://cloud.google.com/vision/

The result: +Page 1 +Block 1 +Paragraph 1 3 C A R S +Block 2 +Paragraph 1 1 o F L O W E R S +Block 3 +Paragraph 1 5 R A B B I T S 2 M O U N T A I N S +Block 4 +Paragraph 1 B I R D S

So instead of "0" it became "o" and the "7" is gone and 2 lines became one paragraph (rabbits and mountains)

sararob commented 7 years ago

@AndroidDeveloperLB to answer your questions:

  1. Crop hints returns coordinates to detect the dominant object or face in an image. You can find code samples in a few languages for it here.

  2. Here are some Android samples for the NL and Speech APIs. There's also Java samples for each of the APIs.

  3. Each of the APIs has a free tier (Vision is 1000 requests / month). Details can be found on the pricing page for each API (vision here).

AndroidDeveloperLB commented 7 years ago
  1. What about for Android? Isn't there a sample for it?

  2. Both samples cannot be built. The "Speech" sample has this error:

Error:All flavors must now belong to a named flavor dimension. The flavor 'prod' is not assigned to a flavor dimension. Learn more at https://d.android.com/r/tools/flavorDimensions-missing-error-message.html

And the "NL" (what's NL exactly?) sample has this error:

D:\android\Android studio Projects\android-docs-samples\nl\Language\app\src\main\java\com\google\cloud\android\language\AccessTokenLoader.java Error:(70, 81) error: cannot find symbol variable raw Error:Execution failed for task ':app:compileDebugJavaWithJavac'.

Compilation failed; see the compiler error output for details.

  1. What's a "Unit" ? A single image being sent? Or a single user that uses the service? Or something else? If it's a single image, it seems quite expensive, no? 1.5-3.5$ per image.... No amount of users can cover these expanses... Even if they paid...