Performance and flexibility improvement by Single Shot MultiBox Detector

NozomiIto commented 5 years ago

I understand the current implementation retrieves all "//*[not(child::*)]" elements (which means all elements without children?) by XPath and predicts each category of the element image in parallel. I think this approach is very simple and stable, but we can improve this logic more by using SSD(Single Shot MultiBox Detector). SSD detects multiple object classes and locations in an image at one time (like this).

I'm afraid the current implementation cannot recognize object which does not have the corresponding Appium element. On the other hand, SSD model can detect any object in an image even if the object does not have the corresponding Appium element. This is quite useful for Unity app, WebView without permission etc, which Appium often cannot handle well.

I found tensorFlowJS has SSD sample model, and I think this can be used to this AI locator.

The problems which we should overcome are:

Precision: Actually I'm doing the same thing as this locator by SSD in our commercial test tool, but achieving good precision by SSD is sometimes more difficult than simple image classification CNN model such as MobileNet, which is used by this project.
Prediction time: SSD scan for the whole image takes 0.1-0.3 sec with GPU and 3-5 sec with CPU currently. Sometimes this can be faster than all bottommost elements iteration by xpath, but sometimes can be slower.
If result of SSD AI locator should be exactly the same element as usual Appium element, we need to identify the corresponding Apipum element for the x/y/w/h region calculated by SSD. If so, providing elementForPosition by Appium is the best approach (although it may be as slow as xpath all element iteration). But I wonder if such approach is really necessary. AI locator can be useful in the situation which Appium cannot find the good element, and for such situation, no corresponding Appium element can be retrieved even with elementForPosition method.

I think this is the next step for this locator to become the true AI locator which behaves exactly the same as the human and does not require any system information and permission just like the human tester.

I'd like to know what do you think about this improvement.

jlipps commented 5 years ago

Hi @NozomiIto these are really interesting thoughts, and I think your assessment of the benefits and drawbacks of the SSD approach is accurate. One possibility is not returning full-blown Appium elements, but returning ImageElements instead, and all we need for them is the screenshot bounds of the element.

I haven't used the SSD approach before myself so I'm not sure how to go about prototyping it, but I'd be happy to assist if you want to help with a proposal @NozomiIto!

NozomiIto commented 5 years ago

Thanks @jlipps !

ImageElements

Oh, sounds good! I didn't know that element! It is used for the image locator, isn't it?

I'd be happy to assist if you want to help with a proposal @NozomiIto!

Yeah, sure. I'm familiar with Python (tensorflow or caffe) SSD, but I have never used tensorFlowJS and its SSD model. I will take a look at tensorFlowJS SSD and check how it can be integrated to Appium.

testdotai / appium-classifier-plugin

Performance and flexibility improvement by Single Shot MultiBox Detector #3