metz-sh / simulacrum

Code-playground to visualise complex engineering flows.
https://metz.sh
Apache License 2.0
362 stars 16 forks source link

Record and Replay functionality #26

Open microsoftly opened 1 month ago

microsoftly commented 1 month ago

A use case I ran into was to actually have the flow metadata collection active in a deployed non-prod environment and then send a test event, which would generate some type of file. That file could then be provided to metz.sh along with the code and allow for "replaying" of events.

For my particular case, this would be very helpful for visualizing the process flow for canary events before allowing broader releases. I know in my example I said non-prod environments, but imagine hitting an endpoint [POST] /your/endpoint/here, it returns some ID and then be able to in a browser navigate to /metz.sh/ui/, which would visualize the replayable events based on the returned id in the previous step.

How achievable does this seem ? Is this in line with your intended functionality ?

iostreamer-X commented 1 month ago

I actually didn't imagine this to be used as a spectroscope!

A few question though:

Once again, interesting idea and use case!

microsoftly commented 1 month ago

Currently metz only accepts code which follows its rules. Meaning the deployed non-prod code has to be follow them as well if we want it be processed by metz. But I imagine when you say record, your code simply emits events that can be compiled and run on metz. Your actual code doesn't get processed by metz. Is this understanding correct?

Yep. I made a similar functionality when I built mocha-tape-deck. It would record the real network calls during a test, and then encode the results to a file for playback.

I would imagine playback would be possible if you had a "runtime events" playback file (e.g. a scenario cassette that could be loaded into the VCR that is metz) and visualization could happen with a copy of the code, which is where my other opened issue about possibly having a UI in a vscode extension makes sense -> the code context is now local to the UI in that case.

To expand on this further - I could see companies recording a playback of some bug and actually storing it either with a commit hash reference or in VCS directly so that they could replay historic and patched bugs to help identify other affected data down the line (e.g. a math error screwing up the books and it only becoming clear 2 months after patching the bug, where being able to recreate the original bug and have a replayable reference would be massively helpful in resolving affected data).

I do need to apologize, I'm making these asks without looking under the hood deeply into implementation details! If it helps, I can try to find some time to give more concrete implementation suggestions.

iostreamer-X commented 1 month ago

Hey, thanks for getting back!

The tape-deck approach makes sense. Tangential, but if support is added then going back a step in the playground also becomes possible.

The bottleneck I see is the code context. I can look into a VS-code extension to do same, but that would require me to change the roadmap which I have already started building. Maybe a self hosted version for now might suffice?

As for the asks, no worries, keep them coming! For contribution, feel free to join the slack, and I would be happy to answer any questions you might have about the code.

taras commented 1 week ago

@microsoftly, what you're describing sounds a lot like OpenTelemitry tracing. We're using that now to see what happens with a request. You get a trace that looks like this

image

Source: https://grafana.com/blog/2021/04/13/how-to-send-traces-to-grafana-clouds-tempo-service-with-opentelemetry-collector/

microsoftly commented 1 week ago

We're already using tracing libraries.

I think having visualizations like metz.sh for code level data propagation would be useful for debugging.

Logically, I see it to be potentially complimentary to tracing, e.g. certain traces may "record" metz/equivalent compatible data for viz and include that as part of the UI.