skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
https://docs.skrape.it
MIT License
805 stars 59 forks source link

[QUESTION] Charset not applied and fetch question marks instead #186

Open edenhalfon opened 2 years ago

edenhalfon commented 2 years ago

I want to achieve html and parse the links and their title. The title is on Hebrew (RTL lang). Instead of getting the real title I get "????" instead. What am I missing here? (I tried changing the charset but usually UTF-8 is good enough)

Code Sample response { htmlDocument { // parsed Doc is available here a { withAttributeKey = "data-item" findAll { println( it.attribute("href")) } } } }

christian-draeger commented 2 years ago

hey, thx for finding this. could you provide a html snippet or even the url you want parse from?

edenhalfon commented 2 years ago

Hey, Sure the link is: https://www.htzone.co.il/benefit/562/דלי-ריי/?sale_id=62 the language in the middle is Hebrew.