Get info from Tibia.com without Selenium

igorquintaes commented 4 years ago

I readed a few code of TibiaThQueuer and noticed that the system is using Selenium WebDriver to obtain data from tibia.com. I really like Selenium when we talk about to write acceptance tests or someway need to simulate users interactions, but maybe is not a good approach when there is a need only to scrape some data from a website. It can be bad for some reasons:

Time spent to launch driver and browser (in this case, chrome driver and chrome browser);
Time spent to maintain chromedriver.exe and chrome when Google releases new versions;
Need to keep Chrome browser installed on target or dev machine;
Huge memory and proccessment usage;
The current approach to launch chromedriver is not compatible with linux (I don't know if you want to make TibiaThQueuer compatible with Linux/Mac).

Suggestion to avoid all mentioned points:

Since you need, mainly, access a external webpage to obtain some data that is included on html, you just need make a HTTP request to desired link, obtain the html content and parse it to get all needed content. Selenium does it for you, but you are launching a entire browser just to make this request.

Dot Net Core can provide all tools and libs to obtain that data without launch and use a browser just to do it, using its encodings, parsing and httpclient libraries. Also, there are good nuget packages that can simplify the way that you consume that data - for example, HtmlUtilityPack. With this last package, you can obtain all html from the target url and parse the desired data using XPath annotation, also supported by Selenium.

If you need some help to do that, or even some code example i can provide and help you.

kemaldev commented 4 years ago

Hello,

I have previously been able to parse the HTML off of Tibia's website without Selenium with the help of frameworks like HTMLAgilityPack. Not long ago Cipsoft made changes to the way you access their website information and they've added a CloudFlare protection which makes it harder to parse the HTML they have on their website. This means that we need to run Javascript and solve the CloudFlare "Challenge" before we can get access to the HTML from the request we've made. I did not find any other viable way than using Selenium to fix this issue.

If you have an idea to fix this issue without using Selenium that'd be great, please let me know!

igorquintaes commented 4 years ago

Got it, I did not know that Tibia implemented CloudFlare protection sometime ago. Well, what about to consume those data from the supported fansite https://tibiadata.com? Looks like is easier and faster to obtain and manipulate those info when consuming a RESTful API and considering that it has a trustly link with Cipsoft as Supported Fansite, maybe worth it.

kemaldev commented 4 years ago

I do not want to use any third party APIs since I do not have control over them and because they can go down at any time and then we're standing there without core functionality. I would love to get away from Selenium but in a way where we can still have control over how we're fetching the data from the Tibia website.

kemaldev / TibiaThQueuer

Get info from Tibia.com without Selenium #8