Running RPA-Python in Docker - some thoughts and caveats I have

ck81 commented 4 years ago

Hi @kensoh,

Was wondering if you have tried, or know of any users, running RPA-Python in Docker?

See. I have conducted 2 runs of RPA classes to our Master's students at the National University of Singapore - using TagUI as the RPA tool to illustrate the many key concepts of RPA . In class, there were students that use Windows 10, Mac and Linux. And for both classes, there were always some students that have problems setting up TagUI e.g. missing DLL, missing lib, missing java library, problem with username with space, etc. Some students even have problem setting up Python!

That's why for my 3rd run of the RPA class, I want to set up a self-contained environment using VirtualBox or Docker - with Python and RPA-Python all setup and ready to run. I have no problem setting up VirtualBox. But I'm very new to Docker. Heard many good things about Docker and wanted to give it a try.

I saw that there are 5 TagUI docker image in Docker Hub: https://hub.docker.com/search?q=tagui&type=image

Are you aware of them? Have you tried any of them?

Also, as I'm new to Docker, it seems that when running TagUI in Docker, it's mostly in headless mode. Is that right?

Are you aware of any way to set up a Docker with the standard GUI, i.e. running a Chrome browser inside a Docker with the ability to interactively use the Chrome Developer Tool - just like the way we use VirtualBox?

kensoh commented 4 years ago

Hi @ck81 I'm afraid I have not tried that and have no in depth experience with Docker images.

It looks like this image has the highest download count, you may want to start with that to try - https://hub.docker.com/r/hmascend/tagui

Some thoughts -

Docker image is usually for Linux, as macOS and Windows OS need valid paid license
visual automation SikuliX needs special setup on Linux to work - see this guide
Docker image most likely can serve use cases for web-apps only, because thick-client desktop apps would most likely need a paid license and won't be scalable this way thru image.
In theory, I believe if the image is a Linux distribution that already has a GUI system with it, it should be possible to VNC into that system to see a desktop environment to run Chrome and do the usual desktop stuffs
In Linux, I think if any browser is installed, it would be Chromium i think. will need to install Chrome separately to use

Overall, I think it's a great idea to conduct your class with a standardised image, so that the environment is already there. and not have to deal with env setup problems (which isn't the focus of your class).

Yes I have conducted a class once last year, and I was surprised that installing Python is an uphill almost impossible tasks for some attendees! Because their company IT policy has certain firewall or app restrictions that make it really hard to install something which we assume is straightforward like Python.

ck81 commented 4 years ago

Hi @kensoh,

Thanks for the many pointers!

Yes, I intend to use Docker to run only your RPA-Python within Linux, and for web-apps only.

Will give it a try and share with you more later if I can get it to work.

dpnthanh commented 3 years ago

Hi All, i was make a docker images for this rpa project, you can try this in docker hub Link: https://hub.docker.com/r/nhth199x/rpa-python I was write a example in Overview page

Inaldomarinho commented 3 years ago

Hi All, i was make a docker images for this rpa project, you can try this in docker hub Link: https://hub.docker.com/r/nhth199x/rpa-python I was write a example in Overview page

Hello @dpnthanh , I need to run rpa-python on a python larger than 3 and wanted to know what steps you used to create the image. If you could give me a direction or help I would be very grateful.

Thanks for listening.

Nam-T commented 3 years ago

Hi @Inaldomarinho , One of my projects needs to use RPA-Python on AWS SAM, my leader and i built a Dockerfile and I pushed its images to DockerHub. It uses python3.8 and chrome. https://hub.docker.com/repository/docker/namthp99/python3.8-rpa-aws-sam Hope it will help you!

jamesmnixon commented 3 years ago

@kensoh or @dpnthanh

I am trying to run RPA Web automation with airflow from within a docker container.

@dpnthanh I saw and used your image, and you were able to make RPA work within a docker container. But I didn't see a docker file that shows what's actually inside of the container. for security purposes, I would like to re-create it. My end goal is to add the necessary dependencies onto an existing image that holds my airflow and other services.

@kensoh Do you have any insight into this, potentially able to showcase what I would need? After installing Chrome on my container, installing and importing RPA it endlessly waits in r.init(). When I interrupt it with 'keyboard interrupt this is the trace: (I added the "headless" and "read" print statements for testing:

jamesmnixon commented 3 years ago

Hi @Inaldomarinho , One of my projects needs to use RPA-Python on AWS SAM, my leader and i built a Dockerfile and I pushed its images to DockerHub. It uses python3.8 and chrome. https://hub.docker.com/repository/docker/namthp99/python3.8-rpa-aws-sam Hope it will help you!

@Nam-T

is there any way to share your docker file? it is not included on your image repo

kensoh commented 3 years ago

Hi Guys, nice discussion here on running with Docker and on Linux! Recently, I created a working example using Google Colab, you can check out or make copy of the notebook below to see some of the things done to make it work there.

Google Colab - https://colab.research.google.com/drive/13bQO6G_hzE1teX35a3NZ4T5K-ICFFdB5?usp=sharing

Namely, if using Chromium instead of Chrome, need to change a setting in the TagUI engine.

If running in headless mode (without display and monitor), you can now do it with v1.34 headless option.

For running as root, Chromium/Chrome doesn't allow that for security reasons, so a change in run flag is needed.

Other than above, Ubuntu will require installing PHP because it does not come with PHP. And using computer vision and OCR stuffs will require installing OpenCV and Tesseract - https://sikulix-2014.readthedocs.io/en/latest/newslinux.html

Also, this RPA for Python package is based on a forked version of TagUI open-source RPA tool. Feel free to join Telegram community group chat to post any questions - https://t.me/rpa_chat

jamesmnixon commented 3 years ago

@kensoh

Thank you for your amazingly quick reply. I used your notebook and followed your instructions with the exception of having to use 'apt install chromium' instead of 'apt install chromium-browser as I'm using Debian.

The setup and replacing of the strings in the tagui file worked, but when it came to initialize with r.init() it hung and required keyboard interrupt to see where it was stuck:

Do you know what I'm missing or what else I can try. Here is a screenshot of both my terminal showing success in the installs and my notebook:

kensoh commented 3 years ago

Hi @jamesmnixon a few ideas to try -

try adding r.debug(True) before r.init() to see if there is any clue from the logs
edit /root/.tagui/src/tagui and search for below line $chrome_command --user-data-dir="$TAGUI_DIR/chrome/tagui_user_profile" $chrome_switches $window_size $headless_switch > /dev/null 2>&1 & and add below line just before above line. this will print the exact command to run Chrome. Then you try running Chrome manually this way from the terminal to see if any problem happened there that hangs the execution echo $chrome_command --user-data-dir="$TAGUI_DIR/chrome/tagui_user_profile" $chrome_switches $window_size $headless_switch
this looks like an issue that would be happening for the upstream TagUI project, there's a weekly Zoom 1-to-1 call every Thursday from 4-5pm SGT (UTC+8), see if you can join to look at it together - https://github.com/kelaberetiv/TagUI/issues/914

kensoh commented 3 years ago

Adding on, some time back a user has an unknown issue starting Chrome because his company network policy blocks Chrome from serving a local web socket connection. The TagUI engine requires that web socket connection as a backdoor to control Chrome. For him, the issue was fixed by tweaking the network policy or adding some exception.

kensoh commented 3 years ago

Want to update back here that James joined the call and problem resolved -

issue with Chromium browser running on the system (some install Chromium snap error) --> switch to Google Chrome
issue with python command not found --> change ~/.tagui/src/casperjs/bin/casperjs to point to python3 on his system

jamesmnixon commented 3 years ago

Want to update back here that James joined the call and problem resolved -

issue with Chromium browser running on the system (some install Chromium snap error) --> switch to Google Chrome

issue with python command not found --> change ~/.tagui/src/casperjs/bin/casperjs to point to python3 on his system

@kensoh Thank you for your help on the call. Very insightful. One area that also pertains to running inside a container is dealing with ReCaptcha. I couldn't figure out why my RPA wasn't working in the container, even though others sites did. Until I added a snap after each line. This is what I found. it got stuck on the ReCaptcha page. Do you have any recommended way to get past this:

kensoh commented 3 years ago

Hi @jamesmnixon I see, it looks like the website has anti-automation checks when running on the container. I've heard good review before on 2captcha, a very affordable service provider that can automate solving captchas through API. You can see if below is useful - https://2captcha.com/recaptchav2_eng_instruction

Alternatively, I heard that some folks set up Xvbf to create a virtual display to run Chrome or visual automation on their Linux instances. You can also try settting up Xvbf and run Chrome in the normal visible mode through Xvbf and see whether such setup will still prompt for this captcha check. I haven't tried out Xvbf myself, but below gives an idea of what it involves - https://gist.github.com/addyosmani/5336747

richylyq commented 3 years ago

Hi all, I have been trying to get RPA-Python to run with the docker container that I am building but i am facing issues when the RPA is running. This is currently built with the usage of PyWebIO as the UI and running RPA-Python with Google Chrome and using the headless_mode=True I've added the r.debug(True) to see what went wrong and i saw the error to be [RPA][ERROR] - following happens when starting TagUI... Terminated /root/.tagui/src/tagui: line 51: pwd: write error: Broken pipe

Anyway i am pretty lost on where to find /root/.tagui/src/tagui in Docker images so will also appreciate if anyone can show me the light 😅

kensoh commented 3 years ago

I haven't heard of this from users, it seems to be some Docker issue affecting different apps - broken pipe error on Docker

The line that triggered error is below in /root/.tagui/src/tagui file.

if [ "$tagui_baseline_mode" == false ]; then set -- "$(cd "$(dirname "$1")"; pwd)/$(basename "$1")" "${@:2}"

You can try setting to below to see if that works. But if there is some root cause related to Docker that running pwd command can trigger errors, there might be a lot more of similar errors that happening before TagUI can run successfully.

if [ "$tagui_baseline_mode" == false ]; then set -- "/root/.tagui/src/tagui" "${@:2}"

richylyq commented 3 years ago

if [ "$tagui_baseline_mode" == false ]; then set -- "/root/.tagui/src/tagui" "${@:2}"

thanks for the prompt update! i tried the change you mentioned above, and i encountered a fresh new error which i will be trying to solve if possible.

ERROR - for nested conditions, loops, popup, frame, set { and } explicitly
ERROR - add { before this line and add } accordingly - if [ -f "$online_flowname" ]; then if grep -iq "404\|400" "$online_flowname"; then rm "$online_flowname"
ERROR - automation aborted due to above

kensoh commented 3 years ago

Hi @richylyq I'm sorry I made a mistake, the change below should be the full path and filename for the script that you are running. Can you type add this in your Python program import os; print(os.getcwd()) so that you know where is the generated rpa_python file? After finding out the pathname, you can form the full path and file name to replace -

if [ "$tagui_baseline_mode" == false ]; then set -- "/full_path/rpa_python" "${@:2}"

Hopefully that will give more clues what is going on. I suspect it's probably the way the tool is being used isn't doable out of the box because when TagUI run normal shell commands using $() like pwd it gets broken pipe error. It might be OS or environment related issue. But you can first try changing how the package and TagUI works to see if that helps.

richylyq commented 3 years ago

Hi @richylyq I'm sorry I made a mistake, the change below should be the full path and filename for the script that you are running. Can you type add this in your Python program import os; print(os.getcwd()) so that you know where is the generated rpa_python file? After finding out the pathname, you can form the full path and file name to replace -
if [ "$tagui_baseline_mode" == false ]; then set -- "/full_path/rpa_python" "${@:2}"
Hopefully that will give more clues what is going on. I suspect it's probably the way the tool is being used isn't doable out of the box because when TagUI run normal shell commands using $() like pwd it gets broken pipe error. It might be OS or environment related issue. But you can first try changing how the package and TagUI works to see if that helps.

Hi @kensoh i made the switch and the new broken pipe error is shown to be coming from the change current to TAGUI directory TAGUI_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"; cd "$TAGUI_DIR" when pwd is used in this context, does it mean it is trying to get the path of where the tagui file is currently held, and since the permission to somehow access the root folder isn't given which leads to the broken pipe error?

or could this be affected by the base image i used for my Docker container if it's an environment related issue.. 🤔

what did the other TagUI users that successfully integrated Docker with TagUI use to build

kensoh commented 3 years ago

File permission issue could be possible. You can chmod -R 777 on the current working directory where rpa_python is generated and the ~/.tagui folder to see if that helps. If you run as root, make sure the package is installed as root. If you run as normal user, try install the package as a normal user.

The $() command for bash scripts if a normal bash syntax, so it might be permission or some environment issue that can cause issues whenever this syntax is used. https://askubuntu.com/questions/833833/what-does-command-do

There are a lot more of $() inside the TagUI launcher script in tagui/src/tagui. If the root cause is not found, I can imagine you having to do a lot of hacking just to prevent that error.

This is an example of the package running on Ubuntu on Google Colab - https://colab.research.google.com/drive/13bQO6G_hzE1teX35a3NZ4T5K-ICFFdB5?usp=sharing

This is a working Docker example with both TagUI and RPA for Python. It's created by @skadefro as an image to provision TagUI instances on his open-source OpenFlow app - https://hub.docker.com/r/openiap/nodered-tagui

skadefro commented 3 years ago

You should probably link to the Dockerfile too https://github.com/open-rpa/openflow/blob/master/OpenFlowNodeRED/Dockerfiletagui

richylyq commented 3 years ago

hey @kensoh @skadefro thanks for the inputs, i will take a look at the Dockerfile, and pray that it works for my build as well!

kensoh commented 3 years ago

Thanks @skadefro this is very helpful. I forgot where to find this Docker file for your image, now I know.

lanSeFangZhou commented 2 years ago

can python-rpa run on linux? Do you have a full demo?

kensoh commented 2 years ago

Yes, see this working Google Colab example running on Ubuntu Linux - https://colab.research.google.com/drive/13bQO6G_hzE1teX35a3NZ4T5K-ICFFdB5?usp=sharing

nicotiendamia commented 2 years ago

Hello i tried rpa locally in my computer and it worked correctly. Now i wanna execute it inside a docker container, through airflow's PythonOperator and i'm having serious trouble setting it up, been trying things for many days now!!

The latest thing u've tried is this https://colab.research.google.com/drive/13bQO6G_hzE1teX35a3NZ4T5K-ICFFdB5?usp=sharing#scrollTo=kl58MzRLyNgb with the only difference that instead of installing chromium-browser i install chromium, and when doing the dump, i replace "google-chrome" with "chromium"

This is my dump code:

current_dir = os.path.realpath(os.path.dirname(file))

self.robot.dump( self.robot.load(f'{current_dir}/.tagui/src/tagui').replace('"google-chrome"', '"chromium"').replace('$headless_switch', '--no-sandbox'), f'{current_dir}/.tagui/src/tagui' )

I have error and debug set to True. This is the output i get:

[RPA][INFO] - setting up TagUI for use in your Python environment [RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder... [RPA][INFO] - /opt/airflow [RPA][INFO] - done. syncing TagUI with stable cutting edge version [RPA][INFO] - TagUI now ready for use in your Python environment [RPA][INFO] - visual automation (optional) requires special setup on Linux, [RPA][INFO] - see the link below to install OpenCV and Tesseract libraries [RPA][INFO] - https://sikulix-2014.readthedocs.io/en/latest/newslinux.html finished dump [RPA][ERROR] - following happens when starting TagUI...

The following command is executed to start TagUI - "/opt/airflow/.tagui/src/tagui" rpa_python chrome

It leads to following output when starting TagUI - /opt/airflow/.tagui/src/tagui: line 304: type: google-chrome: not found ERROR - cannot find Chrome command "google-chrome" update chrome_command setting in tagui/src/tagui and make sure symlink to command is created

Exception initializing RPA: [RPA][ERROR] - [RPA][ERROR] - unknown error encountered

I'm doing this, since i wanna replicate what i will need to do in production, where i have airflow running in an AWS EC2 instance with Ubuntu

kensoh commented 2 years ago

From above log, it looks like for some reason, the file you edited somehow did not get updated to use Chromium.

You can edit this file manually /opt/airflow/.tagui/src/tagui to check where it has "google-chrome" and replace it with "chromium" or whatever command needed to start your Chromium browser. Hopefully that helps! Let me know if it doesn't.

tebelorg / RPA-Python

Running RPA-Python in Docker - some thoughts and caveats I have #140