Open eng1neer opened 7 years ago
The way selectors are generated there is a preference to creating longer selectors but when you added a second annotation the long selector was no longer valid due to the closeness of the elements in the page.
When you had just one annotation the algorithm was going to search the HTML tree to a depth of ~10 elements. When you add a second annotation this tells the algorithm that the annotations are going to be in a specific part of the page and the CSS selectors update to reflect this. There are actually 3 CSS selectors generated. If you look inside the data for the sample you should see that there is an annotation with Item_container: true
and selector: #details-dl
at the top followed by your 2 other annotations.
There is a bug here though. That annotation shouldn't change its selector after you have changed it from automatic to CSS and it shouldn't be used for the generation of the 3rd selector.
@ruairif Thanks for the explanation. What I find problematic here is that #details-dl > dd:nth-child(10)
selector is much less robust than #product-sku
is. Other products may contain another attribute count in the dl
element and the selector that depends on an element order will fail while #product-sku
will work just right. Is there a way to shift preference towards id selectors, make them have more priority? Any help to where I should look is appreciated.
Also, maybe I missed the point, but even when I select an element that is not close to the #product_sku
(I select .page > header > .inner-wrap > .hide-text
in the header), the behavior is the same as in my original post.
Maybe the generation doesn't behave how I think it does then. I can't make it prioritise id selectors but I will fix the bug that causes the css selector to change. After you mark a selector as CSS then it should be static unless you manually change it
I've noticed a strange behavior of selectors in Portia. Here's a zipped video of my workflow:
css-selector-bug.zip
So basically what I do is create a project on scrapinghub.com, create a spider for the https://www.kurtgeiger.com/women/shoes/trainers/slip-ons/logical-silver-glitter-kg-kurt-geiger and then select an element which has a
#product-sku
id. At first Portia gives it#details-dl > dd:nth-child(10)
selector which is strange because a unique id exists for that element. But after creating another annotation, a first element's selector changes to the correct#product-sku
.