jwillmer / web-scraper-chrome-extension

Web data extraction tool implemented as chrome extension
GNU Lesser General Public License v3.0
28 stars 5 forks source link

Multiple Selectors can be used #3

Closed jwillmer closed 7 years ago

jwillmer commented 7 years ago

I found out that you can use multiple selectors at once - sort of. I think we might be able to create a cool feature out of it.

Assumption

We want the background-image links in a list.

HTML

<div class="productShotThumbnails" id="productShotThumbnails">
    <div class="productShotThumbnail" style="width: 20px; height: 20px; background-image: url(http://lorempixel.com/output/technics-q-c-40-40-10.jpg)">1</div>
    <div class="productShotThumbnail selected" style="width: 20px; height: 20px; background-image: url(http://lorempixel.com/output/technics-q-c-40-40-10.jpg)">2</div>
    <div class="productShotThumbnail" style="width: 20px; height: 20px; background-image: url(http://lorempixel.com/output/technics-q-c-40-40-10.jpg)">3</div>
</div>

Workflow

Output

[
{"group":"1","group-style":"width: 20px; height: 20px; background-image: url(http://lorempixel.com/output/technics-q-c-40-40-10.jpg)"},
{"group":"2","group-style":"width: 20px; height: 20px; background-image: url(http://lorempixel.com/output/technics-q-c-40-40-10.jpg)"},
{"group":"3","group-style":"width: 20px; height: 20px; background-image: url(http://lorempixel.com/output/technics-q-c-40-40-10.jpg)"}
]

Conclusion

Without selecting the attribute name style in the Element attribute type we would not have the style content in the groups.

Problem

Since the style content is not gathered in the group selection the string modifications do not apply to it. In fact, the group selection only sees {"group":"1"},"group":"2"},"group":"3"} and the output is later merged together.

Idea

We might be able to merge the content earlier to be able to use string manipulations on the whole output if we like to promote this bug to a feature.

jwillmer commented 7 years ago

As it turns out this feature was only implemented for attributes:

Simplify attribute extraction commit

if(this.extractAttribute) {
    data[this.id+'-'+this.extractAttribute] = $(element).attr(this.extractAttribute);
}
jwillmer commented 7 years ago

My bad, there is a input box for style in the group selector and because it is the same as in attribute selection it was prefilled.