skrapeit / skrape.it

A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered). It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL. It aims to be a testing lib, but can also be used to scrape websites in a convenient fashion.
https://docs.skrape.it
MIT License
805 stars 59 forks source link

Add failing test to show issue with whitespace stripping #174

Closed johanoskarsson closed 2 years ago

johanoskarsson commented 2 years ago

This PR is not intended to be merged, but to help illustrate the problem I'm having.

I've been struggling to parse a div with an attribute that contains a string with whitespace in it. I've included a test below to show what I mean. It looks like the whitespaces are intentionally stripped in the code but I'm not sure if it was intended to strip inside the values also? Or possibly I'm doing this wrong. Either way I'd love a hand.

Thanks!

christian-draeger commented 2 years ago

Uh good finding. I guess it is really uncommon to have spaces im attributes (class attribute is an exception where the space is meaningful). Here is a good article about meaningful and meaningless spaces in html attribute values: https://www.impressivewebs.com/leading-trailing-spaces-html-attribute-values/

Nevertheless, it is valid html so we need to support it.

I will have a look as soon as I can

christian-draeger commented 2 years ago

@johanoskarsson I just released version 1.1.7 including the patch to allow whitespaces in attribute values of selectors :) Have fun, hope it helps. If you have further general questions just let me know by opening a "question" issue. If you like the library and want to help make it known to others I would be glad if you star the project 🌟

johanoskarsson commented 2 years ago

Perfect! Thank you so much for fixing it

On Sat, Dec 11, 2021 at 12:49 PM Christian Dräger @.***> wrote:

@johanoskarsson https://github.com/johanoskarsson I just released version 1.1.7 including the patch to allow whitespaces in attribute values of selectors :) Have fun, hope it helps. If you have further general questions just let me know by opening a "question" issue. If you like the library and want to help make it known to others it would be glad to star the project 🌟

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/skrapeit/skrape.it/pull/174#issuecomment-991779830, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAGODD3YTNHESBZKL3PBDDUQO2N7ANCNFSM5JF6KUNA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.