propershark / shark

An event publisher for realtime transit information.
3 stars 0 forks source link

Record and play back Source data for testing #22

Open elliottwilliams opened 8 years ago

elliottwilliams commented 8 years ago

After thinking about this last night, I (re-)discovered VCR. It can be configured to record all HTTP requests made in a given session to a cassette, and then can be told to play back responses from that cassette in sequence.

This is obviously useful for unit testing, since you can call actual source endpoints, store their responses for later use, and easily refresh test cassettes over time, but I think it could also be useful for functional testing of shark's data quality. We can examine and re-examine problem timeframes where Shark is behaving weirdly. We can test sporadic events like route updates and (de)activations. And we can code late at night when no vehicles are out. Here's how it would work:

Shark's sources would have three modes of operation:

Potential complications include the need to adjust Timetable's clock to be in sync with a time-travelling Shark, and operating on and storing the enormous cassettes that, say, 12 hours of transit data would produce. Regardless, I think this would be worth a shot!

faultyserver commented 8 years ago

That's fantastic. I've never seen that library before. I'll definitely be taking a look into this as I start trying to write tests.

I'm not sure that Sources themselves need to have these three modes of operation, though. It seems like VCR is only meant to be run during tests (e.g., it hooks into WebMock, not Net::HTTP directly), which would make it rather difficult to configure for realtime usage. I do think caching the past 3-4 hours of source traffic would be useful in capturing those problem timeframes, but I don't think this is the right solution for that.

Maybe we could set up a separate instance for functional testing of everything after sourcing, with a single Source that just loops a given set of data, rather than making requests.

In terms of storage, as long as the tests are well-structured, make repeated use of the same data, and only make necessary requests, I'm not too concerned about storage size. Additionally, a lot of the unit tests in particular will probably end up using factories to generate data, just because that gives absolute control over the data, rather than having to find a specific case somewhere in a cassette.

With regard to Timetable, we might as well implement it with this in mind, allowing an optional timestamp parameter that will be used as the current time, rather than always assuming the current time.

elliottwilliams commented 8 years ago

Sure. Maybe recording/playback for downstream functional testing could be done with (1) a middleware that can save timestramped objects to disk as they are updated, and (2) a source that can read those saved objects over time.

Though (correct me if I'm wrong) I was under the impression that webmock can be enabled outside of a testing environment, and can be set to forward all requests through VCR by not stubbing any responses. So it could be used to record sessions of data, although perhaps not very elegantly.

faultyserver commented 8 years ago

You're right. I've never needed it elsewhere, so I was assuming it was built for RSpec, but it looks like WebMock can work anywhere. Reading the docs, though, it seems like getting a good proxy through to VCR would be a bit awkward. But, I can't say anything definitively, since I haven't tried it.

I'm more keen on the middleware/source idea, though, since it can implement a rolling window, which I couldn't figure out how to do with VCR in my hour or so of looking around.

faultyserver commented 8 years ago

I'm starting work on this now, and my plan of action at this point is to create a data set from the events that get fired through Shark::Agency (i.e., all of the default events), which should be sufficient for testing middlewares.

Sources will probably end up getting their own repos at some point, so they'll have their own test suite, and pretty much every part of the main system (Shark::ObjectManager and Shark::Agency) can be tested with factories, since they're primarily concerned with collections, rather than the values inside of each object.