A curated list of awesome Intellegient RPA Robotic Process Automation resources.
01 理解RPA

02 RPA的优势

03 RPA和AI是什么关系?

04 RPA与财务共享服务

05 RPA选型与ADII实施方法



RPA是Robotic Process Automation的缩写,从字面便不难看出其要义,即:机器、流程、自动化,RPA是以机器人作为虚拟劳动力,依据预先设定的程序与现有用户系统进行交互并完成预期的任务。从目前的技术实践来看,现有的RPA还仅适用于高重复性、逻辑确定并且稳定性要求相对较低的流程。









比如以下在Mac OSX系统下利用Apple Script所编写的简单工作自动化代码(让Google Chrome浏览器在新窗口中打开百度首页),可以看到语法非常简单,基本上已经是英语大白话了。

上面就是RPA的简单原理示例,当然现今各大软件厂商推出的RPA工具远比上述我们提及的小工具在功能丰富度上、场景的针对性上强很多,但其核心逻辑并没有本质的差异,在某些特定的业务场景下,熟练的Excel VBA开发者仅利用office工具甚至也能完成好的RPA工作(许多RPA工具仍然需要Excel VBA来进行协同工作)













人工智能(Artificial Intelligence)是一个相当广泛的概念,人工智能的目的就是让计算机这台机器能够象人一样思考,而当前被广泛提及的机器学习(Machine Learning)都只是人工智能的分支,机器学习是专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,使之不断改善自身的性能。

战胜围棋各段高手的Google AlphaGo就是机器学习的代表,它所使用是深度学习(Deep Learning)方法,DL试图使用包含复杂结构或由多重非线性变换构成的多个处理层(神经网络)对数据进行高层抽象的算法,因此能够处理以前机器难以企及的更加复杂的模型(比如:高度的不确定性、超大的计算量)






以一个典型的交易型财务共享服务中心为例,常见的业务流程一般包括销售至收款(OTC)、采购至应付(PTP)、员工费用报销(T&E)、资产核算(FA)、总账与报告(RTR)、资金结算(TR)等流程,这些流程里不少业务处理环节都具备高度的标准化、高度的重复性特点,这也是RPA大展拳脚的广阔空间,那么现阶段这些流程里RPA有怎样的应用的Best Practice呢?
































































Generic test automation framework.

A complete and graceful API for Wechat. 微信个人号接口、微信机器人及命令行微信,三十行即可自定义个人号机器人。

微信公共帐号自动回复机器人 A Node.js robot for wechat.

Golang 跨平台自动化系统,控制键盘鼠标位图和读取屏幕,窗口句柄以及全局事件监听
🤖 The world's simplest framework for creating Bots

🤖 A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs

WeChat Bot SDK for Personal Account, Powered by TypeScript, Docker, and 💖

Roro is a free open-source Robotic Process Automation software.

    1 Good workflow/flowchart designer - enables better traceability of automation with process and ease of maintenance. 2 Business objects are reusable by design 3 Supports distributed development. True enterprise level centralized Architecture. 4 In built reporting module with lowest level of granularity and strong analytical engine. 5 Detailed transaction log available 6 Supports Role based access control, Active Directory (AD) integration 7 Tool can get Control room data through query 8 Centralized secured credential storage and HIPPA, PCI compliant security standard. 9 Ability to provide BP components as web service - provide greater flexibility in designing enterprise level solution. 10 Runtime execution is fast, but slower than AA 11 Good all-round functionality with a proven track record of deployment at many enterprises 12 OCR support (Tesseract)

1 "Record and Play" feature not available 2 Windows 10 not supported 3 Limited browser support beyond IE 4 Version control management tool not available 5 Product Licence is costly. Need to have minimum licences. 6 Windows based control room as opposed to web based control room from compatatiors. 7 Mainly used for Backoffice (unassisted automation). RDA / assisted automation is supported through surface automation (double check) using events like button click event.

Important Elements of Blue Prism:
Process studio
Object studio
Control room
System manager and Release manager
Application Modeller

  1. Pega Robotics / Openspan:

1 Great RPA tool for both RDA and RPA. RDA is assisted automation where Agent an Bot work together. It requires a design of user interface. 2 Integrated with Visual studio and great if you have DotNet/C# experience. 3 Framework components like Logging, runtime configuration (read dynamic values from xml) can be built as C# DLL / custom conrols and import it to openspan studio. 5 Text adapter/Emulator for mainframe support 6 TFS integration

1 Non Visual studio .Net developers find it difficult. 2 Localized credential staorage in flat file. 3 Inbuilt work queues not available

Important Elements of PegaRobotics/Openspan 1 Robotic Desktop Automation (RDA) support by adding windows forms 2 Adapters (Web, Desktop, Text and Citrix) 3 Global Container, Activities and Interaction manager to pass data back and forth between projects 4 Entry and Exit points

  1. Automation Anywhere: PROs
    1 "Record and Play" feature available making development easy and faster 2 Supports MODI, TOCR and external OCR Engines (Accuracy is very good) 3 Built in Software version control (SVN) available 4 Centralized web based dash board for runtime Bot monitoring 5 Good analytical feature plug-in (Bot Insight) for business and operational data 6 Metabot feature available to integrate DLLs, create re-usable components and perform offline development 7 Supports Role based access control 8 Ease of deployment. Offers flexibility and ease of doing business 9 Runtime execution is fast 10 Desktop automation as well as enterprise capabilities 11 Web based control room that allows to access it from mobile

1 Inbuilt work queues not available 2 AA has localized credential staorage in flat file. Credential vault is not secured for users 3 Need minimum programming experience 4 AA only has task level reporting capability 5 AA has procedural approach towards exposing AA components. 6 AA fails to create a complete Virtual workforce concept by not allowing to share same workstation with Agent/user. 7 Exposing web service not supported in AA 8 Limited features for data driven backend operations, compare to its competators. For ex, excel automation or file automation.

Elements of AA

1 TaskBot - Basic Automation License, used to automate rule based transactional processes 2 MetaBot - Advanced features for large deployments, Offline app automation and consuming objects. 3 IQBot - For Cognitive RPA. 4 BotFarm - For cloud deployment 5 BotInsight - For business and operational analytics

  1. UiPath
    1 Has robust workflow designer and recording feature 2 Efficient image recognition 3 Feature available to record Citrix processes 4 Business objects are re-usable (Sequences, flowcharts) 5 Native support to OCR and various Cloud options (Average accuracy) 6 In-built software version control (TFS, SVN) 7 Supports Role based access control 8 Elastic search / indexer server available for operational and historical data of the robots 9 Multi-tenancy – you can separate control room into several areas e.g. per department or DEV and QA 10 Supports both front and back office automation

1 Overall runtime performance is slow 2 Requires 8 GB RAM 3 Java code not supported whereas VBA, C# are supported 4 Able to extract PDF data but extracting tabular data from PDF is lengthy process

Elements of UiPath: 1 UiPath robots - Enterprise ready Front and Back office robot 2 UiPath Studio - 3 UiPath Orchestrator - Enables the Orchestration and management of thousands of robots from a single command centre

Exercises from the UiPath Academy Courses

wanghaisheng commented 6 years ago







作为RPA的一些代表软件商有美国的Automation Anywhere,英国的Blue Prism,罗马尼亚的UiPath

还有WorkFusion,Pegasystems,NICE,Redwood Software,Kofax等等。

美国的Automation Anywhere的占有率最高,是在Windows系统上运行,主要在任务编辑器上记录想要自动化的作业过程,然后做成脚本。对网页数据的抓取和依据计划进行文件转送等的业务,提供数十种的模板,和OCR,JAVA等的结合组件也给提供,不过要收费。

英国的Blue Prism是在微软的.NET Framework之上做成的,提供比较丰富的组件,支持的领域也比较广泛,使用中央式管理,就是费用太贵。











Use AI to free human from drudgeries.

很多工作被机器取代,这是大势所趋。最近国外有个很流行的概念叫 robotic process automation (RPA),暂时还没有准确的汉语翻译,本质就是使用计算机技术把日常工作中重复性的工作自动化,省时省力还能避免网球肘。

其实在我并不知道这个概念之前,就自己做了一些微小的工作。我是做 SAP 实施的,在本司各种密集的项目中,有大量的重复性配置要做。当初曾经复制粘贴到手都感觉要断了,而且这种无聊的工作让人怀疑人生,我他妈是干嘛来了,为什么要做这种没有意义的事情,这种事情不应该是印度人做么。。



首先讲笨笨的自动化,这种其实很简单,就是用编程模仿鼠标和键盘的操作,然后交给计算机重复执行。编程语言的话当然是用 Python 大法啦,最简洁优雅的语言就是坠吼的。


举个栗子,比如要把一个 CSV 里面的数据更新到网站,因为一次性只能更新一行的数据,所以要重复非常多次才能完成。如果写一个简单的脚本,那么 Python 可以自己去填写,一个 tab 填用户名,两个 tab 填密码,再 N 个 tab 填描述等等。这些流程是写死在程序里的,所以叫硬编码。对于简单的重复性场景来说,这不失为有效的自动化方法,就像 Excel 里面的宏一样。下面这段是我用来发 QQ 骚扰我妹的,迅速狂发同一条信息。。它跑在 Mac 上,Windows 上运行的话键的名称改一下就好了。可见非常简单。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from pymouse import PyMouse
from pykeyboard import PyKeyboard

m = PyMouse()
k = PyKeyboard()

for i in range(10):

    k.type_string("Xiong Hai Zi!")


但是硬编码有它自己的问题,因为所有都是写死的,所以它只能适应单一的 GUI 场景,万一有变化呢?比如说在 SAP 配置页面,不同的 FTP/RFC/HTTP adapter 要配置的东西完全不一样,如果针对每一个都硬编码的话,理论上是可以的,但也太麻烦了,适应性也不好,万一 SAP 有个新的东西又要写一堆代码。那么有没有更聪明的办法呢?这就是聪明的自动化了。


事实上,我只写了不到 300 行代码,就实现了可以配置 SAP PI 所有类型的场景,即使有新的场景一样可以对付,而且不管电脑怎么换,屏幕分辨率怎么变,全部都可以一键搞定。

更高级的自动化就是人工智能了,这个目前还停留在想法阶段,因为强人工智能还没有实现。虽然神经网络、深度学习在模仿大脑,阿法狗也彻底战胜了人类,但电脑依然不能像人一样思考,理解人类的语言,模仿人类的感情。从 Siri 们的智商就能看出来人工智能目前还是人工弱智,但它们在很多细分领域已经赶上甚至超越了人类。像自动驾驶,其实应用的和我一样是计算机视觉的技术,只是更加复杂,训练和调试的方式不太一样。这些人工智能目前的发展,已经可以带来巨大的价值。未来甚至可以想象,把大多数生产活动都交给机器后,可以实现无人化工厂,原料进去,产品出来,中间不需要人类参与。这不就是实现了物质极大化吗?所以人工智能才是最有可能实现共产主义的,虽然人类社会制度的转变总是伴随着战争、冲突、危机等剧痛。

最后谈一谈人工智能导致失业的问题。与其说是失业,我更愿意称之为解放,把人类从机械、重复、无聊的工作中彻底解放出来,让人们把时间用来做有趣的事情,过有意义的生活,而不是把生命浪费在辛勤劳作中。人类发展的历史是生产率不断提高的历史,从工业革命早期的蒸汽机开始,就是不断在用机器替代人力,虽然有经济周期的波折,但这个世界的财富绝对值,人们的平均生活水平是一直提升的。现在很多人恐惧甚至反对人工智能,但这就像当年卢德分子破坏织布机一样是徒劳的;法国的出租车司机抗议 Uber,必然也是在试图逆历史潮流过程中螳臂当车。任何有利于生产力的科技最终都势不可挡。不过强人工智能可能带来伦理性的辩论,但那是另一个更大的话题了。 很多所谓的争议只是短期和长期,局部和全局的区别。自动化的进步短期肯定会造成失业,因为那些工人的就业弹性很低,很难立即找到合适的工作。再加上社会分配制度一般是远远滞后于技术进步,所以会造成贫者越贫富者越富的情况;但从长期来看,这些新技术提升了生产率、资源利用效率,一定是更有利于人类社会。从局部和全局的角度来看,对于新技术直接冲击的行业,像出租车行业,肯定是弊大于利,因为可能造成整个行业的衰退;但对于所有其他人,共享经济让出行变得更便宜、更方便。所以很多纷争,只是所处的立场和时间跨度不同,跳出这些框框才能看清问题的本质。 编辑于 2017-03-12

AutoHotkey is a powerful and easy to use scripting language for desktop automation on Windows.

How Facebook scales AI

Facebook's products and services are powered by machine learning. Powerful GPUs have been one of the key enablers, but it takes a lot more hardware and software to serve billions of users.

Most of Facebook's two billion users have little idea how much the service leans on artificial intelligence to operate at such a vast scale. Facebook products such as the News Feed, Search and Ads use machine learning, and behind the scenes it powers services such as facial recognition and tagging, language translation, speech recognition, content understanding and anomaly detection to spot fake accounts and objectionable content.

The numbers are staggering. In all, Facebook's machine learning systems handle more than 200 trillion predictions and five billion translations per day. Facebook's algorithms automatically remove millions of fake accounts every day.

In a keynote at this year's International Symposium on Computer Architecture (ISCA), Dr. Kim Hazelwood, the head of Facebook's AI Infrastructure group, explained how the service designs hardware and software to handle machine learning at this scale. And she urged hardware and software architects to look beyond the hype and develop "full-stack solutions" for machine learning. "It is really important that we are solving the right problems and not just doing what everyone else is doing," Hazelwood said.

Facebook's AI infrastructure needs to handle a diverse range of workloads. Some models can take minutes to train, while others can take days or even weeks. The News Feed and Ads, for example, use up to 100 times more compute resources than other algorithms. As a result, Facebook uses "traditional, old-school machine learning" whenever possible, and only resorts to deep learning--Multi-Layer Perceptrons (MLP), ConvolutionalNeural Networks (CNN), and Recurrent Neural Networks (RNN/LSTM)--when absolutely necessary.

The company's AI ecosystem includes three major components: the infrastructure, workflow management software running on top, and the core machine learning frameworks such as PyTorch.

Facebook has been designing its own datacenters and servers since 2010. Today it operates 13 massive datacenters--10 in the U.S. and three overseas. Not all of these are the same since they were built over time and they do not house the same data since "the worst thing you can do is replicate all data in every data center." Despite this, every quarter the company "unplugs an entire Facebook datacenter," Hazelwood said, to ensure continuity. The datacenters are designed to handle peak loads, which leaves about 50% of fleet idle at certains times of the day as "free compute" that can be harnessed for machine learning.

Rather than using a single server, Facebook took hundreds of workloads in production, put them in buckets, and designed custom servers for each type. The data is stored in Bryce Canyon and Lightning storage servers, training takes place on Big Basin servers with Nvidia Tesla GPUs, and the models are run on Twin Lakes single-socket and Tioga Pass dual-socket Xeon servers. Facebook continues to evaluate specialized hardware such as Google's TPU and Microsoft's BrainWave FPGAs, but Hazelwood suggested that too much investment is focused on compute, and not enough on the storage and especially networking, which in keeping with Amdahl's Law can become a bottleneck for many workloads. She added that AI chip startups weren't putting enough focus on the software stack leaving a big opportunity in machine learning tools and compilers.


Facebook's own software stack includes FBLearner, a set of three management and deployment tools that focus on different parts of the machine learning pipeline. FBLearner Store is for data manipulation and feature extraction, FBLearner Flow is for managing the steps involved in training, and FBLearner Prediction is for deploying models in production. The goal is to free up Facebook engineers to be more productive and focus on algorithm design.

Facebook has historically used two machine learning frameworks: PyTorch for research and Caffe for production. The Python-based PyTorch is easier to work with, but Caffe2 delivers better performance. The problem is that moving models from PyTorch to Caffe2 for production is a time-consuming and buggy process. Last month, at its F8 developer conference, Facebook announce that it had "merged them internally so you get the look and feel of PyTorch and the performance of Caffe2" with PyTorch 1.0, Hazelwood said.

This was a logical first step for ONNX (Open Neural Network Exchange), an effort by Facebook, Amazon and Microsoft to create an open format for optimizing deep learning models built in different frameworks to run on a variety of hardware. The challenge us that there are lots of frameworks--Google TensorFlow, Microsoft's Cognitive Toolkit, and Apache MXNet (favored by Amazon)--and the models need to run on a variety of different platforms such as Apple ML, Nvidia, Intel/Nervana and Qualcomm's Snapdragon Neural Engine.

There are a lot of good reasons for running models on edge devices, but phones are especially challenging. Many parts of the world still have little or no connectivity and more than half of the world is using phones dating from 2012 or earlier, and they use a variety of hardware and software. Hazelwood said there is about a 10X performance difference between today's flagship phone and the median handset. "You can't assume that everyone you are designing your mobile neural net for is using an iPhone X," she said. "We are very anomalous here in the U.S." Facebook's Caffe2 Go framework is designed to compress models to address some of these issues.

The deep learning era has arrived and Hazelwood said there are lots of hardware and software problems to solve. The industry is spending lots of time and money building faster silicon but, she said, we need equal investment in software citing Proebsting's Law that compiler advances only double compute performance every 18 years, "Please keep that in mind so we don't end up with another Itanium situation," Hazelwood joked, referring to Intel's non-defunct IA-64 architecture. The real opportunity, Hazelwood said, is in solving problems that no one is working on building end-to-end solutions with balanced hardware and better software, tools and compilers.

Screen scraping is according to Wikipedia "a technique in which a computer program extracts data from the display output of another program. The key element that distinguishes screen scraping from regular parsing is that the output being scraped was intended for final display to a human user, rather than as input to another program, and is therefore usually neither documented nor structured for convenient parsing."

'''GUI automation''' is more or less the opposite of screen scraping. It aims to drive an application to a desired state by simulating mouse and keyboard events like a normal user will do.

===The challenge === Wikipedia continues by shedding light '''on how difficult the scraping''' is from a technological standpoint. "Screen scraping is generally considered an ad-hoc, inelegant technique, often used only as a "last resort" when no other mechanism is available.

Prior to [[Api-documentation|UiPath]], screen scraping solutions were based on employing OCR techniques on screenshots. OCR is traditionally slower, error prone (the best accuracy rates are 95% for typewritten documents), not suitable for applications screens as many UI elements interfere with the OCR algorithms producing undesired results and a good OCR engine is extremely expensive.

GUI automation is no less difficult. One might think that sending mouse and keyboards at screen coordinates is easy but it is not at all reliable. GUI might resize and change position not to speak about different screen resolutions, color depth and font size.

Let's say you need to get the invoice number from your old CRM app to copy into your brand new web based CRM. Old invoice number is displayed as a label in the Customer Invoice screen. You cannot use fixed screen coordinates to scrape it because users might change the position of CRM main window, you cannot use relatively client coordinates because the label might flow if the users change the size of the invoicing window and you cannot use the label handle (hwnd) because it changes between different instances of the app. Same challenge when you want to copy the invoice number, you need to click the input filed prior to send keystrokes but where to click?

===The solution=== The original idea behind UiPath was to intercept and analyze Windows GDI TextOut functions family calls in order to detect the particular text that an application is writing in a given region on the screen. While the idea is clean and nice the implementation is not trivial. It took us 5 years of continuous development to achieve the solid performance of today versions. Our stress test regularly performs 1 million consecutive screen scrapings without any crash or degradation in the system performance.

;What about other technologies that do not use GDI to render text? We support PDF, Flash, Flex, Java, Silverlight, WPF, QT, FoxPro, HTML (IE, Chrome, Firefox), MS Office, Console. For each of these technologies we had to create connectors that understand their internal document object model or use accessibility where available . This is more limited in scope because text layout information is not exposed but it has the advantage to extract entire text from most UI controls and it works even with scrolling or hidden windows.

;Does it cover all scenarios? As the last resort, we have leveraged OCR technology to be able to scrape those screens that display text as images. We took Google Tesseract free OCR engine and tweak it to work with screen fonts. This is not a trivial task in itself as OCR engines works best on scaned paper at higher resolutions. We were able to get 95% accuracy at character level.

;What about text position changing? Here comes [[UiNode]]. It identifies windows or controls based on a plain text [[Selector]] that is calculated from immutable attributes of a window/control like title and class. You can consider it as a query used to match a running instance of a UI object or even better as variable name that points to a real UI object on the screen.

What's best, UiNode provides a single interface that works identical with different UI technologies and different type of UI controls. You program the same against a VB6 or a web and a Java app. All you have to do is to use ScreenScraper Studio to calculate a selector when you design your scraping process and then you can use it to initialize a UiNode that points to the real Ui object on the screen.

Does it solve automation? Well, partly. UiNode is crucial in a good automation because it locates UI elements on the screen with 100% accuracy. The next part of automation is to convinse UI to behave the way we want. What comes immediately to mind is to simulate mouse and keyboard input. We know where is the UI object on the screen so it should be handy to move mouse there and click it. In fact that works beautifully when the app to be automated is in foreground. When the app is not in foreground it is possible to send windows messages or even to use the object model exposed by the Ui framework to perform actions.

===Why more scraping methods?=== ScreenScraper offers more than one method of doing screen scraping. It largely depends on the target application technology and your requirements what method works best in a particular case. Read below the pros and cons of each method.

===Native method=== ;Pros :100% accuracy. :Very fast. :Preserves text layout. :Can precisely get the position on the screen of text fragments. :Can get text font and color.

;Cons :Rarely the screen region that is captured flickers. We did a lot of work to ensure minimal or zero flickering but under heavy stress you might notify slight flickering. Usually this is not notifiable by users. :Works only with apps that uses gdi to render text.

===FullText method=== ;Pros :100% accuracy. :Very fast. :No flickering. :Work with most technologies used to create user interfaces including Windows standard SDK, .Net Forms, Java, Flash, WPF.

;Cons :Can get only the whole text of a UI control. :Cannot get text font and color.

===OCR method=== ;Pros :Works on text that is displayed as a bitmap. Most notable type of apps where OCR is the only solution are Citrix presentation manager and Microsoft Remote desktop. :Can precisely get the position on the screen of text fragments.

;Cons :95% accuracy. This can be improved with custom training for specific fonts. Contact us at for more info. :Slow compared with the other 2 methods though most tasks will take less than half a second.

Windows桌面共享中一些常见的抓屏技术 Ways to capture the screen

Capturing the screen (or put another way, creating an in-memory copy of the image currently being displayed on the screen, for re-display, printing, or later saving) is a task with many different solutions. I'll try to enumerate the possibilities below. This isn't a perfect or complete list, but it covers the best supported ways, and lists some of their drawbacks.

1 - Use the Desktop Duplication API. This is a very powerful and full-featured API, which provides access to every frame of desktop update, BUT, it is not available prior to Windows 8.

One drawback is that most full-screen exclusive mode DirectX or OpenGL applications will not be able to be captured with Desktop Duplication. Exclusive mode really means that the ‘Windowed’ field of D3DPRESENT_PARAMETERS is set to false. Some apps are not in true full-screen mode and are windowed, but with the window set to the size of the desktop. These can still be captured.

To make matters a little more confusing, newer DirectX (11.1 and later) exclusive mode apps can be captured with the Desktop Duplication API, unless they "opt-out" and specifically disallow it (in which case they won't be capture-able by Desktop Duplication, just like a pre-11.1 DirectX app).

2 - The old standby, GDI, can still capture the screen. Typically, BitBlt is used. This technique is not perfect, and there are some things, such as multimedia output, that won't get captured.

BitBlt can also can be slow, although you can get much better performance by making sure that the bitmap (that you are capturing into) is the same resolution as the screen. To assure this, you could use CreateCompatibleBitmap (for a device-dependent bitmap), or CreateDIBSection (for a device-independent bitmap). If you use CreateDIBSection, just make sure the bit-depth and layout matches the screen's. This allows BitBlt to avoid any color conversion, which can drastically slow everything down. Here's what such code might look like from a high level...

// Pseudo-code...

HDC hDCScreen = GetDC(NULL);

HDC hDCMem = CreateCompatibleDC(hDCScreen);

HBITMAP hBitmap = CreateCompatibleBitmap(hDCScreen, screenwidth, screenheight);

SelectObject(hDCMem, hBitmap);

BitBlt(hDCMem, 0, 0, screenwidth, screenheight, 0, 0, SRCCOPY);

3 - Another option is to use Direct3D. Note that some have reported that this can actually be slower than BitBlt, but performance of any technique will depend a lot on the hardware and graphics settings (regardless of the method used). To do this with Direct3D, you could use GetRenderTargetData to copy the rendering surface (retrieved with GetRenderTarget) to an off-screen surface (which can be created with CreateOffscreenPlainSurface). After copying the surface data this way, you can save the contents of the surface to a file using D3DXSaveSurfaceToFile.

4 - Yet another possibility is to use Windows Media technologies to do the capture (specifically, Expression Encoder will do this if you just need another application). To do it programmatically yourself (using Windows Media), you can use the Windows Media Video 9 Screen codec.

5 - Finally, for pre-Windows 8 systems, you could develop a Mirror Driver to do this (Windows 8 uses another driver model, which is also described under the preceding link).

Some of the above techniques (particularly #2 and #3) can also be used to capture only a single window. Note that iif you use BitBlt to do this, you could "or in" the CAPTUREBLT flag if you wanted to capture windows that are layered on top of your window . The PrintWindow API is another good option for capturing only a single window (not listed above since it can't capture the entire screen).