logstash-plugins / logstash-input-tcp

Apache License 2.0
35 stars 76 forks source link

tcp input extend hosts #118

Open upuv opened 6 years ago

upuv commented 6 years ago

Feature request

TCP input plugin extend the hosts variable to include domain names both short and full qualified.

Current the strong only accepts an IP address. This should be extended to a hostname or fully qualified domain name as well.

In DHCP type environments there is no way to predetermine the ip.

This will also be helpful on environments with multiple interfaces.

Also note that current documentation does not explicitly state that the address is restricted to an IP address.

yaauie commented 6 years ago

@upuv are you looking to bind the input to a single interface by means of a hostname when in the default server mode? Or are you looking to connect to another host's TCP port in client mode?

upuv commented 6 years ago

server mode primarily.

But there really should be no restriction on which mode. Not I don't like the term hostname. As that implies the name of the host it's running on and only that. I prefer to just simply name. Something that resolves to the an interface on the host. Resolves following the configured method of name resolution on the system.

For me it's very important that it is a resolvable name. This can be critical in an automated setup. Especially if you use DHCP to provide ip addresses.

Why is it important? Well think of this scenario. I have a logstash node. I give it the name of "logging". It's obviously a short name in DNS speak. But I also have multiple environments. development,staging,production. However in each environment I have a DNS and a DHCP which provides environment level resolving of names. This allows me to use the "logging" name in each env. Now all of a sudden I have the ability to test my configurations unchanged between env's. This means the chance of error when modifying configurations between env's is reduced dramatically.

That was just one potential use case for the requirement of using names for interface binding.

yaauie commented 6 years ago

Is there a specific reason why you need to only bind to a single interface, overriding the default (which binds to 0.0.0.0, or all interfaces)?

It seems weird to me that a feature would require querying an external system to resolve a hostname (DNS), which would only work in specific simple implementations where a local interface had the same public IP as that listed in the DNS record.

I could see value in binding to an interface by name, if indeed it was necessary to limit to a single interface.

upuv commented 6 years ago

I'm fairly certain we see this from opposite sides. I need a very specific reason to bind to all interfaces. Binding to all interfaces is asking for a security issue.

Now I could probably rattle off 100 different scenarios for binding to a named interface. I've already given a very valid one above. Which is used very often in industry. Especially in scale and secure environments.

But I'll give you a very simple to understand reason as to why you DON'T want to bind to all. Bastion hosts. One interface faces the public net for example and the other internal lan. There is no way in hell am I going to allow any service to by default bind to the public interface. You could probably measure the life span of that host and logstash in seconds before it's hammered out of existence by a script kiddy.

Here is another very simple one. Lets take the old model of web systems called the 3 tier model. Layers are 1. Interface, 2 computer/service, 3 database. each one of these tiers should have two interfaces one facing forward and one back to the next layer. Now the comms between the layers is strictly controlled. As a matter of fact the safest model is to have comms like the one for logstash go through a third interface for management tasks. So in this scenario I only want it to bind on the management interface.

To be perfectly frank the default for all of the elastic tools should be to bind on the loopback only. Anything else as default is very insecure and not at all best practice. But I would also like logstash to bind to a list of named interfaces not just singular. It should also be able to bind to physical interfaces on the host. This solves a raft of other scenarios.

yaauie commented 6 years ago

I see your points about binding to all interfaces on Bastion Hosts, but I also wouldn't typically run Logstash with open ports on a host that isn't otherwise isolated from the network at large :)

My commentary was more about your original request (emphasis mine):

TCP input plugin extend the hosts variable to include domain names both short and full qualified.

Current the strong only accepts an IP address. This should be extended to a hostname or fully qualified domain name as well.

FQDN require external source-of-truth system such as DNS, and it would be much simpler to allow binding to the address(es) provided by one or more interfaces by name (e.g., lo, eth0) than to rely on an external system to look up an address.

upuv commented 6 years ago

Yes of course a FQDN or shortname requires DNS. That is the whole point.

I have to assume you work in environments where updating DNS is difficult or impossible. However in most Enterprise environments you have full control over DNS. DNS is configuration is just one of your CM records. Usually controlled via automation.

And yes I also want the ability to bind to interfaces like lo and eth0.

I want more options for configuration. I do not understand the instance to NOT use names from DNS. Something almost every other application on the planet is able to do. Why is logstash so special that it can't do something so basic and standard.

yaauie commented 6 years ago

@upuv revisiting this, and I'm not seeing a restriction where we limit to only IP addresses.

Current the strong only accepts an IP address.

Under the hood we use either the Java-provided InetSocketAddress via Netty's AbstractBootstrap#bind(String,int) for non-SSL or the Ruby-provided TCPServer when SSL is enabled, both of which support name resolution.


For example, if I alias the IP for example.org (currently 93.184.216.34) to my en0 interface:

╭─{ yaauie@castrovel:~/src/elastic/pr-scratch/logstash-input-tcp/118-hostname-support }
╰─○ sudo ifconfig en0 alias 93.184.216.34 255.255.255.0
[success]

And start up Logstash to listen with the TCP Input Plugin on example.org with port 3333:

╭─{ yaauie@castrovel:~/src/elastic/pr-scratch/logstash-input-tcp/118-hostname-support }
╰─○ $LOGSTASH_HOME/bin/logstash -e "input { tcp { host => 'example.org' port => 3333 } }"
Sending Logstash's logs to /Users/yaauie/src/elastic/releases/6.3.2/logstash-oss-6.3.2/logs which is now configured via log4j2.properties
[2018-07-31T18:38:12,911][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/Users/yaauie/src/elastic/releases/6.3.2/logstash-oss-6.3.2/data/queue"}
[2018-07-31T18:38:12,923][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"/Users/yaauie/src/elastic/releases/6.3.2/logstash-oss-6.3.2/data/dead_letter_queue"}
[2018-07-31T18:38:13,045][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-07-31T18:38:13,091][INFO ][logstash.agent           ] No persistent UUID file found. Generating new UUID {:uuid=>"d3d94752-b8d7-42b6-89e0-3233bd92d71c", :path=>"/Users/yaauie/src/elastic/releases/6.3.2/logstash-oss-6.3.2/data/uuid"}
[2018-07-31T18:38:13,606][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.3.2"}
[2018-07-31T18:38:15,981][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-07-31T18:38:16,055][INFO ][logstash.inputs.tcp      ] Starting tcp input listener {:address=>"example.org:3333", :ssl_enable=>"false"}
[2018-07-31T18:38:16,475][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x55810039 sleep>"}
[2018-07-31T18:38:16,541][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2018-07-31T18:38:16,814][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Then send data to example.org:3333 in a separate shell:

╭─{ yaauie@castrovel:~/src/elastic/pr-scratch/logstash-input-tcp/118-hostname-support }
╰─○ echo 'hello world' | nc example.org 3333

Logstash receives the input:

{
       "message" => "hello world",
          "host" => "93.184.216.34",
          "port" => 52654,
    "@timestamp" => 2018-07-31T18:39:20.726Z,
      "@version" => "1"
}

Additionally, if I try to send the same to localhost:3333, as expected nc fails to connect:

╭─{ yaauie@castrovel:~/src/elastic/pr-scratch/logstash-input-tcp/118-hostname-support }
╰─○ echo 'hello world' | nc -v localhost 3333
localhost [127.0.0.1] 3333 (dec-notes): Connection refused
[error: 1]

What have you tried, and how did it behave differently than you expected?