Closed jamesturk closed 2 years ago
Immediately realized one caveat, the end dates are often in the past, which would lead to nothing being run. If there are no 'active' sessions, the latest one in the file could be used, essentially falling back to the current behavior.
cc @showerst @jessemortenson
Dates tend to be so wonky with legislatures, and since we're manually adding new sessions anyway i'd be strongly in favor of the "current" flag in the metadata.
Some sample cases i'm thinking where dates are a pain:
I think that the inevitable "Oops I forgot to go flip the boolean on the old session when I added the new one" bugs are easier to fix and debug then the "Why did i commit but my session didn't scrape?" or "Why is this scrape taking four hours and the 'HB 123' it emitted isn't the one I want" ? issues.
I'll admit that this approach does raise the barrier to entry for new contributors, but I think I'd be fine w/ someone submitting a patch that doesn't disable scraping older sessions and we have to add that at a later date.
Overall it's a good idea though, it will be nice to avoid configuring duplicate tasks for scraping old and new, and make everybody's knowledge of the state of what should be getting collected more explicit from this repo.
A good test case right now is VA, at least as of a few days ago --
So here we want to flag [1,3] to be scraped, but only the third case is "in" if you're asking what the current session is, or want to highlight a map or whatever.
Thanks for pointing out these edge cases, this makes a lot of sense & I'm won over that the current flag is better than trying to automate guessing after all. (I'd really mainly been considering the case where the only thing being scraped unnecessarily was a special session, but the examples above make sense & I'll borrow them for the OSEP)
just opened up #30 with a version of this that relies upon an active
flag, figure any future discussion can happen there
planning a small OSEP for this, but wanted to kick around ideas first
Right now when os-update runs, it only runs on the last session in the metadata.
I'd like to change that to have it run for all 'active' sessions by default. The logic would look something like:
If an argument like
session=2021B
was passed, that would supersede this behavior (as it does now).This would allow us to avoid situations where a state is simultaneously updating their regular and one or more special sessions.
I think it'd have minimal impact on runtime since most specials take <30 mins.
Any concerns? Things I'm possibly forgetting?
The other idea would be to add a
current
flag to the metadata, but that'd require more active management & I figure begin & end dates are a decent proxy for this.