vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.5k stars 1.53k forks source link

vector tap without following continuously #14726

Open mjperrone opened 1 year ago

mjperrone commented 1 year ago

A note for the community

Use Cases

I'm trying to make a script which will grab a single event from each of the components in our configuration. I can use vector tap to do this, but because the command continuously pulls events at --interval, for me to script this, I have do some messy timing stuff:

vector tap --limit 1 --outputs-of parse --interval 1001 > parse.out.json &
sleep 2
kill %2

Attempted Solutions

No response

Proposal

I'd like the default behavior of vector tap to mirror that of tail. Without specifying an interval, it just prints --limit number of events and then returns. If you do specify an --interval, then it will run until a signal kills it.

Alternatively, if you want to maintain the default behavior of periodically emitting new events, then I'd propose a new flag --once, or --no-follow which will only grab data once and then return.

References

No response

Version

0.24.0

mjperrone commented 1 year ago

Here's the script I ended up with:

cat ../../vector-"${node_type}"*/config/*.toml | grep '\[sources\.\|\[transforms\.' | awk -F. '{print $2}' | sed 's/]//g' | sort | uniq  > "${node_type}"_outputting_components.txt
sleep 4 # wait for port forwarding to start
while read component; do 
  vector tap --limit 1 --outputs-of "$component" --interval 2000 > "${node_type}"."$component".out.json & job=$!
  sleep 1.5 # wait for it to poll once
  kill $job
done < "${node_type}_outputting_components.txt"

cat ../../vector-"${node_type}"*/config/*.toml | grep '\[transforms\.\|\[sinks\.' | awk -F. '{print $2}' | sed 's/]//g' | sort | uniq  > "${node_type}"_inputting_components.txt
while read component; do 
  vector tap --limit 1 --inputs-of "$component" --interval 2000 > "${node_type}"."$component".in.json & job=$!
  sleep 1.5 # wait for it to poll once
  kill $job
done < "${node_type}_inputting_components.txt"
spencergilbert commented 1 year ago

πŸ˜† tail feels like a better name for the command now that you mention it. I'm curious if you're using this in a documentation/checking functionality similar to https://github.com/vectordotdev/vector/issues/14702?

mjperrone commented 1 year ago

I'm trying to make it easier for devs on my team who may be editing the vector config to understand what's happening in each component.

For graph, I have our CI check for changes in the topology and then it yells at devs to commit those changes:

... please run:
vector --quiet graph --config "config/*.toml" > topology.dot
# brew install graphviz
dot -Tsvg topology.dot > topology.svg

so we can see it in our github readme.

For tap Right now I made this script that will (try to) dump the IO from production to files locally.

I can imagine an advanced version that uses the same origin event and shows the IO for that event throughout the topology instead of a random event like I'm doing here.

spencergilbert commented 1 year ago

Gotcha, graphing is a nice idea, I really like the concept πŸ‘

For the second one, definitely a hard problem to solve for. I remember when tap was first being designed (and I was just a user) I was hoping to have tooling to copy events out of your live event stream and be able to test local configs against it, or use the VRL repl to play with data.

It's definitely an interesting use-case and something we've discussed.