Open igorquintaes opened 4 years ago
Hello,
I have previously been able to parse the HTML off of Tibia's website without Selenium with the help of frameworks like HTMLAgilityPack. Not long ago Cipsoft made changes to the way you access their website information and they've added a CloudFlare protection which makes it harder to parse the HTML they have on their website. This means that we need to run Javascript and solve the CloudFlare "Challenge" before we can get access to the HTML from the request we've made. I did not find any other viable way than using Selenium to fix this issue.
If you have an idea to fix this issue without using Selenium that'd be great, please let me know!
Got it, I did not know that Tibia implemented CloudFlare protection sometime ago. Well, what about to consume those data from the supported fansite https://tibiadata.com? Looks like is easier and faster to obtain and manipulate those info when consuming a RESTful API and considering that it has a trustly link with Cipsoft as Supported Fansite, maybe worth it.
I do not want to use any third party APIs since I do not have control over them and because they can go down at any time and then we're standing there without core functionality. I would love to get away from Selenium but in a way where we can still have control over how we're fetching the data from the Tibia website.
I readed a few code of TibiaThQueuer and noticed that the system is using Selenium WebDriver to obtain data from tibia.com. I really like Selenium when we talk about to write acceptance tests or someway need to simulate users interactions, but maybe is not a good approach when there is a need only to scrape some data from a website. It can be bad for some reasons:
Suggestion to avoid all mentioned points:
Since you need, mainly, access a external webpage to obtain some data that is included on html, you just need make a HTTP request to desired link, obtain the html content and parse it to get all needed content. Selenium does it for you, but you are launching a entire browser just to make this request.
Dot Net Core can provide all tools and libs to obtain that data without launch and use a browser just to do it, using its encodings, parsing and httpclient libraries. Also, there are good nuget packages that can simplify the way that you consume that data - for example, HtmlUtilityPack. With this last package, you can obtain all html from the target url and parse the desired data using XPath annotation, also supported by Selenium.
If you need some help to do that, or even some code example i can provide and help you.