Open steve-mavens opened 1 year ago
There's no cadence. It's easy for me to update, so I just did in 6f45fed6c56f377e8a9a77ce43c50712281940d8.
$ curl https://publicsuffix.org/list/public_suffix_list.dat > tldextract/.tld_set_snapshot
Some possible solutions.
suffix_list_urls
or cache_dir
kwargs; avoid diverging tests online vs. offline.example.com
or example.probablyneverasuffix
.Thanks very much!
(1) sounds like unwarranted effort for you (and might make the changelog a bit spammy?)
(2) is probably what I should do, or a short suffix list file would cover these tests.
(3) Turns out I'm not a perfect judge of what's probable! The case was chosen as a non-ASCII second-level domain listed in the PSL, and .museum seemed stable at the time. Until now that test file was unchanged since written in 2020, so it's not volatile enough to be a real problem.
(4) Would also work, but even when I have that fully isolated test I usually want the integration test as well, so it's a question of whether I can get away with that integration test being offline, or whether I need to be online in order to test that my understanding of tldextract is correct.
Anyway I think in some sense (2) amounts to saying, "tldextract can be its own fake". It's isolatable enough, and it can be configured with any invented cases needed. So if I do that then it's kind of a semantic argument whether I have a true unit test of my function against a fake I didn't write myself, or an integration test of my function + tldextract with a lower-level dependency (the PSL) stubbed. My team doesn't do enough formal test design for that distinction to matter.
Btw, before I used tldextract I had a checked-in copy of the PSL and my own parser. My commit Function to identify public suffix, from Mozilla's list of rules
was on 2011-02-11. So if I'd worked on other features for another 17 days I guess I could have saved that effort and used tldextract from the start!
Oh, and I think another possible solution is to run a line of code to let tldextract get a fresh PSL in between installing the test environment (which obviously is an online operation) and running the offline tests (with pytest-socket
to enforce offline-ness). I suppose arguably this is just (2) again, with the cache_dir
arg rather that the suffix_list_urls
arg. Or I could make the PSL's URL an exception to pytest-socket
.
So if I'd worked on other features for another 17 days I guess I could have saved that effort and used tldextract from the start!
😊
I like your breakdown. Yeah, there are tradeoffs in all directions, depending how robust and formal you want your test suite. Your last suggestion with the test suite continually updating the PSL reminds me of this article on verified fakes.
Yes, sounds good. I've also seen (but IIRC never implemented) a variant on that where you put an interception layer in, and generate stub responses by capturing the responses from the run of the live version of the test. So instead of the verified fake you have an "updateable stub". I say never implemented: many times I've set a breakpoint and dumped some http response to disk for use as a test case, never have I properly automated that. There's probably a framework for it.
Apologies if I've failed to find this in the docs, but is there any official cadence for how often the PSL snapshot is updated and a new release made?
We tripped over this because of a material change under
.museum
: so now our online and offline tests get different results for one of our test cases, that happens to be in there.Obviously our test case is our problem (and maybe offline tests of code that uses tldextract are not a great idea in the first place). But it would be useful to know if it's our problem for a while, or if you were due to update the snapshot fairly soon anyway.