Closed tpendragon closed 5 years ago
I've done a pretty thorough investigation today. The webmock response takes effect at the Net::HTTP level by way of the RDF gem's graph loading. The Faraday stuff for FAST is not implicated here. The failures only happened on one occasion that I see, and they all appear to be from mismatched expectations vs. mocked responses, not real HTTP requests leaking out.
Specifically, multiple were from getting a Qa::ServiceError
(500) instead of Qa::ServiceUnavailable
(503), but other mocked responses hit other examples, too.
My only suspicion at this point is that there was an unusual timing/parallel issue. I've run the suite with the random seed of the failed run (44328), and it passes. There is a bit of code in the RDF gem that uses Net::HTTP::start
in an until loop, and then there is some class-level interception in webmock. This is likely a narrow isolation issue well upstream.
Since this has only failed once and we have confirmed that the mocks are at least being applied consistently, I'll close this for now, pending recurrent problems.
This is failing more often now. There must be some sticky/leaking state. Looking into it...
Exploring these failures on CircleCI, I can replicate these over SSH and use that environment for debugging. From what I’m seeing on CI, https://github.com/samvera/questioning_authority/blob/master/app/services/qa/linked_data/graph_service.rb#L93 parses the error and looks for the HTTP status code when RDF::Graph.load(url) fails to load from a URL in https://github.com/samvera/questioning_authority/blob/master/app/services/qa/linked_data/graph_service.rb#L11 https://github.com/samvera/questioning_authority/blob/master/app/services/qa/linked_data/graph_service.rb#L104 is what generates the Internal Server Error - Fetch term... which fail the expectations
Further, https://github.com/ruby-rdf/rdf/blob/master/lib/rdf/reader.rb#L199 returns nil
, and the failures seem to start on https://github.com/ruby-rdf/rdf/blob/master/lib/rdf/reader.rb#L209. I was able to trigger https://github.com/ruby-rdf/rdf/blob/master/lib/rdf/reader.rb#L234 to be raised.
elrayle further diagnosed this issue on CircleCI, finding that there was a disparity in error messages which are parsed for the HTTP status code:
• local msg=http://experimental.worldcat.org/fast/search?maximumRecords=3&query=cql.any%20all%20%22cornell%22&sortKeys=usage: (404) • CCI msg=http://experimental.worldcat.org/fast/search?maximumRecords=3&query=cql.any%20all%20%22cornell%22&sortKeys=usage: 404
Please note the missing parenthesis in the error message raised on CircleCI.
This was caused by an older version of the Coveralls gem being included in the bundle on CircleCI. How?
coveralls
gem without a version specifiedrest-client
rest-client
is loaded, RDF::Util::File
uses it, rather than Net::HTTP
IOError
raised only includes a formatted message, which is parsed to re-specify the error for downstream use within QAcoveralls
0.8.23) results in different QA errors being raised by way of a missed case conditionAs far as the underlying issues:
rdf
gem should return structured info it already has (subclass IOError to include url and status code attributes)rdf
gem uses implicit config that it does not log or reveal in a meaningful way (i.e., RDF::Graph.load
relies on a class variable of RDF::Util::File
that is lazily set based on which gems are loaded, so even the existence of the adapter, let alone its identity, is not obvious)circleci local execute
on a workstation)PR #264 is merged and works around the issue. We will follow up with ruby-rdf upstream and, pending changes, move to a more robust solution (e.g., explicit status codes rather than parsing).
The only additional info I would add to this is that the occurrence of error message with parentheses and without was random causing random failures of the tests. The patch in PR #264 uses regex to extract the error code and allows for both formats of the error message. This is only slightly less fragile than before. The recommendation to move to a structured error message for RDF::Graph.load
would allow the error code processing to be more robust.
https://circleci.com/gh/samvera/questioning_authority/766