Closed NusretOzates closed 3 years ago
it's currently not support but should be possible to add as feature. i will have a look how it could be integrated in the DSL to be user friendly. if you have any suggestions or wishes we can discuss them if you want :)
After thinking of it I feel like I decided to decline the feature request because:
aListOfDocElement.filter { it.toCssSeletor.matches("some regex".toRegex() }
What I can imagine is to provide a helper function that is doing the filtering
EDIT: to catch multiple elements at the same time or by partial class, id, ... attribute values this can be useful. i have added it to the DSL.
example:
<body>
i'm the body
<header>
<h1>i'm the headline</h1>
<nav>
<ol class='ordered-navigation'>
<li>1st nav item</li>
<li>2nd nav item</li>
<li>3rd nav item</li>
<li>last nav item</li>
</ol>
<ul class='unordered-navigation'>
<li>1st nav item</li>
<li>2nd nav item</li>
<li>3rd nav item</li>
<li>last nav item</li>
</ul>
</nav>
</header>
</body>
assuming aValidDocument
will invoke the given example html snippet
@Test
fun `can pick element by css selector matching regex`() {
val someRegex = "^(ol|ul).*navigation$".toRegex()
aValidDocument {
findBySelectorMatching(someRegex) {
expectThat(map { it.toCssSelector }).containsExactly(
"html > body > header > nav > ol.ordered-navigation",
"html > body > header > nav > ul.unordered-navigation"
)
}
}
}
@Test
fun `can pick element by css selector matching regex DSL invoke`() {
val someRegex = "^(ol|ul).*navigation$".toRegex()
aValidDocument {
someRegex {
expectThat(map { it.toCssSelector }).containsExactly(
"html > body > header > nav > ol.ordered-navigation",
"html > body > header > nav > ul.unordered-navigation"
)
}
}
}
The filter idea looks great actually! Thanks a lot for that commit too! So for CSS selectors, I can just use it in the test examples and for other attributes (like id), I can use filters very nice! Like this:
extract {
htmlDocument{
html {
findAll {
filter {
it.attribute("id").matches("([a-z0-9A-Z\"\'_\-\s]*footer[a-z0-9A-Z\"\'_\-\s]*)\"".toRegex())
}
}
}
I have one more request too but I am not sure if I should open a new issue for it. Can you add an example of how to import the library when using the SNAPSHOT version of the library
The snapshot release to jitpack seems to be broken currently. I will have a look within the next days. For all features discussed here or mentioned in the readme version 1.0.0 of artifact skrapeit has been published :)
describe what you want to archive Hello! I would like to find elements using regex like in this beautiful soup example :)
Code Sample
self.get_element(r'\s*id=\"([a-z0-9A-Z\"\'_\-\s]*footer[a-z0-9A-Z\"\'_\-\s]*)\"', decoded_object)