Allow sniffing to only return nodes with specific ES node attributes

ppf2 commented 8 years ago

Hot/cold architecture is common in ES/LS deployments where hot nodes will be tagged with a specific attribute. It will be helpful to allow the end user to specify a node attribute tag to match as part of sniffing, eg. configure sniffing but tell it to only use nodes with node_type:ssd for example.

gmoskovicz commented 8 years ago

@acchen97 i noticed the P3 tag. I guess that the output plugin will need to ask for the cluster state to sniff specific nodes. Is it expected to do this anytime soon?

jordansissel commented 8 years ago

What is the expected result of this feature? Do we have evidence the feature will have the desired result?

On Monday, September 26, 2016, Gabriel Moskovicz notifications@github.com wrote:

@acchen97 https://github.com/acchen97 i noticed the P3 tag. I guess that the output plugin will need to ask for the cluster state to sniff specific nodes. Is it expected to do this anytime soon?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/438#issuecomment-249692282, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIC6p1GurHKyloSQOHdlCqEalN26zigks5quC8BgaJpZM4I05BY .

jordansissel commented 8 years ago

Specifically, a tag isn't the same as a routing id, so the node receiving the request may not even have the index or the right shard(s).

I am not against this, but it's abnormal (Elasticsearch already supports this feature, so why would a client need it?), so I want to make sure we know what result is expected and if we even think the proposed feature can achieve the desired result.

On Friday, November 11, 2016, Jordan Sissel notifications@github.com wrote:

What is the expected result of this feature? Do we have evidence the feature will have the desired result?

On Monday, September 26, 2016, Gabriel Moskovicz <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@acchen97 https://github.com/acchen97 i noticed the P3 tag. I guess that the output plugin will need to ask for the cluster state to sniff specific nodes. Is it expected to do this anytime soon?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-output- elasticsearch/issues/438#issuecomment-249692282, or mute the thread https://github.com/notifications/unsubscribe-auth/ AAIC6p1GurHKyloSQOHdlCqEalN26zigks5quC8BgaJpZM4I05BY .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/438#issuecomment-260096247, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIC6oNDnGsExK4a9RTZLdYtNfyPdzCvks5q9SV_gaJpZM4I05BY .

ppf2 commented 8 years ago

This is more of a request on the config management side of things, so users can simply specify the same tags they set up on the ES side and it will automatically set the hosts array to those nodes instead of having to manage a list of host names (that can change over time) directly in the LS config. Current workaround is probably to handle this outside of the Elastic Stack via some external config management tool :)

jordansissel commented 8 years ago

This is more of a request on the config management side of things

I agree on the alignment, but I am missing what is the proposed benefit of this. Let's get closer to the problem. Why would a user want to send _bulk requests only to a certain set of data nodes? What is the expected benefit?

allenmchan commented 8 years ago

Yes a tag is not a routing ID. So in theory the request will need to be re-routed to another node that has the active shard.

In a hot / warm / cold architecture, you do want all indexing requests to hit the hot tier (SSDs or some form of faster storage). So this enhancement is asking that the sniffing return all the nodes that are in the hot tier (which elasticsearch documentation says we can setup tiers using tags). Without this enhancement, i would not be able to leverage the wonderful feature of sniffing. The sniffing would return all nodes by default and bulk requests would hit all nodes therefore violate the teaching that i have to not send bulk indexing to warm or cold nodes.

Hope that clarifies my ask at least.

jordansissel commented 8 years ago

violate the teaching that i have to not send bulk indexing to warm or cold nodes.

What teaching is this?

A _bulk is not written to disk by the node that receives the request. It is processed in memory and forwarded (like any index request) to the nodes bearing those primary and replica shards. A "cold"-tagged node receiving a _bulk would forward the data to the correct node(s) just like any other node would.

Without more information, it seems like you (or maybe me) have some misinformation about what Elasticsearch does with http requests?

jordansissel commented 8 years ago

Also, just to offer a workaround, you can use config management tooling or dns entries to maintain a list of these nodes for use with the Elasticsearch output.

ppf2 commented 8 years ago

There is another workaround which is to take advantage of how sniffing currently returns all nodes with http enabled (regardless of node type). So this can be achieved by disabling http on all masters and warm nodes (if having the hot nodes be coordinating is desired), or disabling http on all masters and data (if http-enabled client nodes are added to the picture to perform the coordinating task).

logstash-plugins / logstash-output-elasticsearch

Allow sniffing to only return nodes with specific ES node attributes #438