Yet Another WinAppDriver (YWinAppDriver)

Microsoft WinAppDriver is the official application to support Selenium-like UI Test Automation on Windows Applications. This service supports testing Universal Windows Platform (UWP), Windows Forms (WinForms), Windows Presentation Foundation (WPF), and Classic Windows (Win32) apps on Windows 10 PCs.

This repo is an open source asp.net core implementation of WinAppDriver and it's compatible with Microsoft's WinAppDriver. Most of the ideas are coming from the test infrastructure of WinUI, Microsoft.Windows.Apps.Test, and WinAppDriver document. I combined them and come up with the open source implementation.

I name this project YWinAppDriver(yet another WinAppDriver).

Project Status: 0.3.x

Use .Net 6 instead of .net core 3.1

Project Status: 0.2.x

Most of the functionalities are ready and you should be able to switch from WinAppDriver to YWinAppDriver without any(or With little) change. SessionController.cs defines all the endpoints it supported.

For the XPath syntax, refer to https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/ms256086(v=vs.100)

Download & Run YWinAppDriver

Download and compile

There are two ways to get the WinAppDriver.exe:

Clone this repo, then open the WinAppDriver.sln and build WinAppDriver project.
or Download it from https://github.com/licanhua/YWinAppDriver/releases

Lauch WinAppDriver.exe Generally speaking, WinAppDriver user would have two settings: http://127.0.0.1:4723 or http://127.0.0.1:4723/wd/hub. Note: dotnet-core runtime is required, please install 3.1 or late it if you have problem to run WinAppDriver.exe

By default, YWinAppDriver is http://127.0.0.1:4723. You can change the port number and basepath easily:

run WinAppDriver.exe --urls http://127.0.0.1:4723 --basepath /wd/hub

A complete command line:

WinAppDriver.exe --urls http://127.0.0.1:4723 --basepath /wd/hub --logpath logs

From Visual Studio

There are two settings are ready for you. IIS Express /wd/hub is http://127.0.0.1:4723/wd/hub

Using appsettings.json

"Urls": "http://127.0.0.1:4723",
"Basepath": "/wd/hub"

Build and run the CalcatorTest in examples Please run the test, please make sure Calculator is in Standard mode.

nodejs YWinAppDriver/WinAppDriver examples:

If you are authoring the test case with Jest, Jasmine or any other JavaScript framework, you can switch between YWinAppDriver and WinAppDriver very easily.

wdio + YWinAppDriver/WinAppDriver

selenium + YWinAppDriver/WinAppDriver

Knowledge for contributors

Asp.Net Core

This project is developed with Asp.Net Core and referred 3.1

Microsoft.Windows.Apps.Test

In Windows, UIAutomation is the technology which test driver could use to manipulate the UI.

Microsoft.Windows.Apps.Test is the core UI Automation library which is spinned from WinAppDriver, which allows user to interact with the testapp outside of WinAppDriver. WinUI is the first public user which implements its own automation without WinAppDriver.

Microsoft.Windows.Apps.Test has the public nuget and is binary open sourced. YWinAppDriver is based on Microsoft.Windows.Apps.Test too and used this library to interact with the testapp.

Microsoft.Windows.Apps.Test documentation can be found here: Microsoft.Windows.Apps.Test.chm.

Protocols There are two protocols: w3c webdriver and selenium JsonWire Protocol JsonWire is obselete, but still there a lot of client/server doen't support w3c specification. So YWinAppDriver is trying to match with both protocols
Locators and capalibilites are the same/nearly the same with WinAppDriver.

API Supported

Below is the table to match with selenium json wire protocol. Because YWinAppDriver is for desktop application other than browser, no below means it's not supported. maybe means it's possible to support it, but I didn't see any value to support it.

Dev Status	HTTP Method	Path	Summary
completed	GET	/status	Query the server's current status.
completed	POST	/session	Create a new session.
completed	GET	/sessions	Returns a list of the currently active sessions.
completed	GET	/session/:sessionId	Retrieve the capabilities of the specified session.
completed	DELETE	/session/:sessionId	Delete the session.
completed	POST	/session/:sessionId/timeouts	Configure the amount of time that a particular type of operation can execute for before they are aborted and a
no	POST	/session/:sessionId/timeouts/async_script	Set the amount of time, in milliseconds, that asynchronous scripts executed by /session/:sessionId/execute_async are permitted to run before they are aborted and a
completed	POST	/session/:sessionId/timeouts/implicit_wait	Set the amount of time the driver should wait when searching for elements.
completed	GET	/session/:sessionId/window_handle	Retrieve the current window handle.
completed	GET	/session/:sessionId/window_handles	Retrieve the list of all window handles available to the session.
no	GET	/session/:sessionId/url	Retrieve the URL of the current page.
no	POST	/session/:sessionId/url	Navigate to a new URL.
maybe	POST	/session/:sessionId/forward	Navigate forwards in the browser history, if possible.
maybe	POST	/session/:sessionId/back	Navigate backwards in the browser history, if possible.
maybe	POST	/session/:sessionId/refresh	Refresh the current page.
maybe	POST	/session/:sessionId/execute	Inject a snippet of JavaScript into the page for execution in the context of the currently selected frame.
maybe	POST	/session/:sessionId/execute_async	Inject a snippet of JavaScript into the page for execution in the context of the currently selected frame.
complete	GET	/session/:sessionId/screenshot	Take a screenshot of the current page
complete	GET	/session/:sessionId/element/:elementId/screenshot	Take a screenshot of the element
no	GET	/session/:sessionId/ime/available_engines	List all available engines on the machine.
no	GET	/session/:sessionId/ime/active_engine	Get the name of the active IME engine.
no	GET	/session/:sessionId/ime/activated	Indicates whether IME input is active at the moment (not if it's available.
no	POST	/session/:sessionId/ime/deactivate	De-activates the currently-active IME engine.
no	POST	/session/:sessionId/ime/activate	Make an engines that is available (appears on the listreturned by getAvailableEngines) active.
no	POST	/session/:sessionId/frame	Change focus to another frame on the page.
no	POST	/session/:sessionId/frame/parent	Change focus to the parent context.
completed	POST	/session/:sessionId/window	Change focus to another window.
completed	DELETE	/session/:sessionId/window	Close the current window.
completed	POST	/session/:sessionId/window/:windowHandle/size	Change the size of the specified window.
completed	GET	/session/:sessionId/window/:windowHandle/size	Get the size of the specified window.
completed	POST	/session/:sessionId/window/:windowHandle/position	Change the position of the specified window.
completed	GET	/session/:sessionId/window/:windowHandle/position	Get the position of the specified window.
completed	POST	/session/:sessionId/window/:windowHandle/maximize	Maximize the specified window if not already maximized.
no	GET	/session/:sessionId/cookie	Retrieve all cookies visible to the current page.
no	POST	/session/:sessionId/cookie	Set a cookie.
no	DELETE	/session/:sessionId/cookie	Delete all cookies visible to the current page.
no	DELETE	/session/:sessionId/cookie/:name	Delete the cookie with the given name.
completed	GET	/session/:sessionId/source	Get the current page source.
completed	GET	/session/:sessionId/title	Get the current page title.
completed	POST	/session/:sessionId/element	Search for an element on the page, starting from the document root.
completed	POST	/session/:sessionId/elements	Search for multiple elements on the page, starting from the document root.
completed	POST	/session/:sessionId/element/active	Get the element on the page that currently has focus.
completed	GET	/session/:sessionId/element/:id	Describe the identified element.
completed	POST	/session/:sessionId/element/:id/element	Search for an element on the page, starting from the identified element.
completed	POST	/session/:sessionId/element/:id/elements	Search for multiple elements on the page, starting from the identified element.
completed	POST	/session/:sessionId/element/:id/click	Click on an element.
no	POST	/session/:sessionId/element/:id/submit	Submit a FORM element.
completed	GET	/session/:sessionId/element/:id/text	Returns the visible text for the element.
completed	POST	/session/:sessionId/element/:id/value	Send a sequence of key strokes to an element.
completed	POST	/session/:sessionId/keys	Send a sequence of key strokes to the active element.
complented	GET	/session/:sessionId/element/:id/name	Query for an element's tag name.
completed	POST	/session/:sessionId/element/:id/clear	Clear a TEXTAREA or text INPUT element's value.
completed	GET	/session/:sessionId/element/:id/selected	Determine if an OPTION element, or an INPUT element of type checkbox or radiobutton is currently selected.
completed	GET	/session/:sessionId/element/:id/enabled	Determine if an element is currently enabled.
completed	GET	/session/:sessionId/element/:id/attribute/:name	Get the value of an element's attribute.
completed	GET	/session/:sessionId/element/:id/equals/:other	Test if two element IDs refer to the same DOM element.
completed	GET	/session/:sessionId/element/:id/displayed	Determine if an element is currently displayed.
completed	GET	/session/:sessionId/element/:id/location	Determine an element's location on the page.
maybe	GET	/session/:sessionId/element/:id/location_in_view	Determine an element's location on the screen once it has been scrolled into view.
completed	GET	/session/:sessionId/element/:id/size	Determine an element's size in pixels.
no	GET	/session/:sessionId/element/:id/css/:propertyName	Query the value of an element's computed CSS property.
no	GET	/session/:sessionId/orientation	Get the current browser orientation.
no	POST	/session/:sessionId/orientation	Set the browser orientation.
no	GET	/session/:sessionId/alert_text	Gets the text of the currently displayed JavaScript alert(), confirm(), or prompt() dialog.
no	POST	/session/:sessionId/alert_text	Sends keystrokes to a JavaScript prompt() dialog.
no	POST	/session/:sessionId/accept_alert	Accepts the currently displayed alert dialog.
no	POST	/session/:sessionId/dismiss_alert	Dismisses the currently displayed alert dialog.
completed	POST	/session/:sessionId/moveto	Move the mouse by an offset of the specificed element.
completed	POST	/session/:sessionId/click	Click any mouse button (at the coordinates set by the last moveto command).
completed	POST	/session/:sessionId/buttondown	Click and hold the left mouse button (at the coordinates set by the last moveto command).
completed	POST	/session/:sessionId/buttonup	Releases the mouse button previously held (where the mouse is currently at).
completed	POST	/session/:sessionId/doubleclick	Double-clicks at the current mouse coordinates (set by moveto).
completed	POST	/session/:sessionId/touch/click	Single tap on the touch enabled device.
completed	POST	/session/:sessionId/touch/down	Finger down on the screen.
completed	POST	/session/:sessionId/touch/up	Finger up on the screen.
completed	POST	session/:sessionId/touch/move	Finger move on the screen.
in progress	POST	session/:sessionId/touch/scroll	Scroll on the touch screen using finger based motion events.
in progress	POST	session/:sessionId/touch/scroll	Scroll on the touch screen using finger based motion events.
completed	POST	session/:sessionId/touch/doubleclick	Double tap on the touch screen using finger motion events.
completed	POST	session/:sessionId/touch/longclick	Long press on the touch screen using finger motion events.
in progress	POST	session/:sessionId/touch/flick	Flick on the touch screen using finger motion events.
in progress	POST	session/:sessionId/touch/flick	Flick on the touch screen using finger motion events.
no	GET	/session/:sessionId/location	Get the current geo location.
no	POST	/session/:sessionId/location	Set the current geo location.
no	GET	/session/:sessionId/local_storage	Get all keys of the storage.
no	POST	/session/:sessionId/local_storage	Set the storage item for the given key.
no	DELETE	/session/:sessionId/local_storage	Clear the storage.
no	GET	/session/:sessionId/local_storage/key/:key	Get the storage item for the given key.
no	DELETE	/session/:sessionId/local_storage/key/:key	Remove the storage item for the given key.
no	GET	/session/:sessionId/local_storage/size	Get the number of items in the storage.
no	GET	/session/:sessionId/session_storage	Get all keys of the storage.
no	POST	/session/:sessionId/session_storage	Set the storage item for the given key.
no	DELETE	/session/:sessionId/session_storage	Clear the storage.
no	GET	/session/:sessionId/session_storage/key/:key	Get the storage item for the given key.
no	DELETE	/session/:sessionId/session_storage/key/:key	Remove the storage item for the given key.
no	GET	/session/:sessionId/session_storage/size	Get the number of items in the storage.
no	POST	/session/:sessionId/log	Get the log for a given log type.
no	GET	/session/:sessionId/log/types	Get available log types.
no	GET	/session/:sessionId/application_cache/status	Get the status of the html5 application cache.

Background

One week ago, another team reached to me to ask some advice to help them choose the UI automation tool. It makes me think: Although WinAppDriver is the de fact tool recommended by Microsoft, Is WinAppDriver the right tool for everybody? I didn't see other open source option yet, so I spent one weekend to create the 0.1 release.

Two years ago, hassanuz was an active contributor on WinAppDriver and he started to promote the WinAppDriver usage in Microsoft. I was working on WinUI, and WinUI used Microsoft.Windows.Apps.Test. So we sit down together to see if I can adopt iWinAppDriver. After the conversation, I found we hit the White box and Gray box dilemma. WinUI is a gray box testing while WinAppDriver is blackbox testing.

WinUI provides a lot of amazing features in its test infrastucture which WinAppDriver doesn't support:

Wait.ForIdle is the killer feature to make the testing stable. When WinAppDriver saw the element, it doesn't mean UI is ready for interaction. Wait.ForIdle lets you know UI is ready to take user's input, so you will not run into the unstable situation that automation test case clicked the button in test pipeine, but there is no response, and I never reproduce the problem locally.
Dump the visual tree when test failed
Restart the application when launching failed, and kill the application when there is exception in test
Pan, drag, scroll, and gamepad support.
Speed. WinUI has thousand of test cases, and I want it be finished as soon as possbile.

To adopt WinAppDriver, we need to resolve these problems first. Then we introduced the plugin mode into WinAppDriver> User can build their own business logic to the plugin, the WinAppDriver would load it in start up. So for WinUI, I can move the Infra part into the plugin. For native application, ExecuteScript is not used, so we can re-use it without any impact to the existing selenium clients. So the message flow look like this:

Selenium Client ExecuteScript-> WinAppDriver -> WinUI Plugin -> UIA - TestApp

We finished the prototype. Because the legal concern and there is no business value from leader's aspect, we didn't make it into the end user.

I think YWinAppDriver is able to address above problems, and possible make every body happy.

Supported Locators to Find UI Elements

Windows Application Driver supports various locators to find UI element in the application session. The table below shows all supported locator strategies with their corresponding UI element attributes shown in inspect.exe.

Client API	Locator Strategy	Matched Attribute in inspect.exe	Example
FindElementByAccessibilityId	accessibility id	AutomationId	AppNameTitle
FindElementByClassName	class name	ClassName	TextBlock
FindElementById	id	RuntimeId (decimal)	42.333896.3.1
FindElementByName	name	Name	Calculator
FindElementByTagName	tag name	LocalizedControlType (upper camel case)	Text
FindElementByXPath	xpath	Any	//Button[0]

Supported Capabilities

Below are the capabilities that can be used to create Windows Application Driver session.

Capabilities	Descriptions	Example
app	Application identifier or executable full path	Microsoft.MicrosoftEdge_8wekyb3d8bbwe!MicrosoftEdge
appArguments	Application launch arguments	https://github.com/Microsoft/WinAppDriver
attachToTopLevelWindowClassName	app should be "Root", Existing application top level window to attach to. if you are using WinAppDriver, please use appTopLevelWindow	`0xB822E2`
appWorkingDir	Application working directory (Classic apps only)	`C:\Temp`
forceMatchAppTitle	If app is launched, but have problem to match it, YWinAppDriver do the last try to match with the application title	Calculator
forceMatchClassName	If app is launched, but have problem to match it, YWinAppDriver do the last try to match with the class name	Chrome_WidgetWin_1
clickWithInvoke	Use Inovke other than click to get better performance	true/false

YWinAppDriver addressed some WinAppDriver 1.2 issues and fixed them

WinAppDriver has problem to start c:\windows\system32\calc.exe, PostMan, or Notepad++ issue 1372

In YWinAppDriver, you can workaround the problem with the capabilites like below.

"forceMatchAppTitle": "Calculator"
"forceMatchClassName":"Notepad++"

Appium desktop can't show all controls WinAppDriver provided

Appium Desktop is a great tool to inspect the app's elements and it's very easy to learn and use it. Currently WinAppDriver can't show all elements. I don't know if it's the issue of WinAppDriver or Appium, but YWinAppDriver doesn't have this problem.

YWinAppDriver Known issue

The Click is slow than WinAppDriver

You can set capabilities {clickWithInvoke: true}, it will be very fast. The down side is it will not close the flyout like Menu Windows until you click somewhere else.

licanhua / YWinAppDriver

readme