Closed typhoon2099 closed 3 years ago
If it’s failing to load consistently, the problem is either with the connection or the source location. Note that you can either override the documentLoader with your own custom loader, or configure with the appropriate gem using the RDF::Util::File extensions.
The source location is a Nokogiri::XML::Element
(unchanged between upgrades). I'm getting my solutions using:
PRODUCT_LINKS_QUERY = %(
PREFIX rsp: <http://rubygems.org/gems/sparql#>
PREFIX s: <http://schema.org/>
SELECT ?url ?image ?description
WHERE {
{ [] ?p s:OfferCatalog } UNION { [] ?p s:ItemList }
[] s:itemListElement ?item .
?item s:url ?url
OPTIONAL {
?item s:image/s:url* ?image
FILTER (!isBlank(?image))
}
OPTIONAL { ?item s:name ?description }
}
)
(
RDF::Graph.new << RDF::RDFa::Reader.new(nokogiri_document, base_uri: base_uri)
).query(SPARQL.parse(PRODUCT_DATA_QUERY))
The above code is slightly convoluted as I'm merging different methods together to keep the code smaller.
RDF does it guarantee any inherent order to the data, and the default Graph/Repository uses a hash structure that is known for not preserving input order. You might add some ORRDER clauses to the query.
Although it’s not really an effective solution, there is an rdf-ordered-repo gem that will preserve insertion order.
Okay. I wasn't actually sure if this was a bug or not, more likely we were being lucky that the order was preserved in the first place. Not sure what's changed or why, the diff looks fairly innocouous, I'll have to have a think around how to handle these unusual Graphs (if at all).
I have RSpec tests that looks for Product data and returns the first found Product on a page (and tries to merge together solutions to get an Array of image for that Product). This was working on 3.1.8, but after updating to 3.1.9 it seems to fail as the solutions have started coming back with no clear order.
Here's some example HTML:
One of the tests expects to find a URL of
https://site.com/first-product
, but now fails half the time, returninghttps://site.com/second-product
instead.Is this a known issue, and if so, is there a way to ensure that returned solutions come back deterministically (ie in the order they're found in the HTML)?