telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
65 stars 105 forks source link

Cygnus error with hdfsSink #1643

Open amfgomez opened 5 years ago

amfgomez commented 5 years ago

Hello everyone:

I am trying to save data sensors in HDFS, I am using CYGNUS.

the cygnus configuration is below:

cygnus-ngsi.sources = http-source cygnus-ngsi.sinks = hdfs-sink cygnus-ngsi.channels = hdfs-channel

cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource cygnus-ngsi.sources.http-source.channels = hdfs-channel cygnus-ngsi.sources.http-source.port = 5050 cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler cygnus-ngsi.sources.http-source.handler.notification_target = /notify cygnus-ngsi.sources.http-source.handler.default_service = default cygnus-ngsi.sources.http-source.handler.default_service_path = / cygnus-ngsi.sources.http-source.interceptors = ts gi cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /opt/apache-flume/conf/grouping_rules.conf cygnus-ngsi.sources.http-source.interceptors.nmi.type = com.telefonica.iot.cygnus.interceptors.NGSINameMappingsInterceptor$Builder cygnus-ngsi.sources.http-source.interceptors.nmi.name_mappings_conf_file = /opt/apache-flume/conf/name_mappings.conf

cygnus-ngsi.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.NGSIHDFSSink cygnus-ngsi.sinks.hdfs-sink.channel = hdfs-channel

cygnus-ngsi.sinks.hdfs-sink.enable_encoding = false

cygnus-ngsi.sinks.hdfs-sink.enable_grouping = false

cygnus-ngsi.sinks.hdfs-sink.enable_lowercase = false

cygnus-ngsi.sinks.hdfs-sink.enable_name_mappings = false

cygnus-ngsi.sinks.hdfs-sink.data_model = dm-by-entity

cygnus-ngsi.sinks.hdfs-sink.file_format = json-column

cygnus-ngsi.sinks.hdfs-sink.backend.impl = rest

cygnus-ngsi.sinks.hdfs-sink.backend.max_conns = 500

cygnus-ngsi.sinks.hdfs-sink.backend.max_conns_per_route = 100

cygnus-ngsi.sinks.hdfs-sink.hdfs_host = 10.9.8.29 cygnus-ngsi.sinks.hdfs-sink.hdfs_port = 50070 cygnus-ngsi.sinks.hdfs-sink.hdfs_username = stack cygnus-ngsi.sinks.hdfs-sink.hdfs_password = stack

cygnus-ngsi.sinks.hdfs-sink.oauth2_token =

cygnus-ngsi.sinks.hdfs-sink.service_as_namespace = false

cygnus-ngsi.sinks.hdfs-sink.oauth2_token =

cygnus-ngsi.sinks.hdfs-sink.service_as_namespace = false

cygnus-ngsi.sinks.hdfs-sink.batch_size = 100

cygnus-ngsi.sinks.hdfs-sink.batch_timeout = 30

cygnus-ngsi.sinks.hdfs-sink.batch_ttl = 10

cygnus-ngsi.sinks.hdfs-sink.batch_retry_intervals = 5000

cygnus-ngsi.sinks.hdfs-sink.hive = false

cygnus-ngsi.sinks.hdfs-sink.krb5_auth = false

cygnus-ngsi.channels.hdfs-channel.type = com.telefonica.iot.cygnus.channels.CygnusMemoryChannel cygnus-ngsi.channels.hdfs-channel.capacity = 100000 cygnus-ngsi.channels.hdfs-channel.transactionCapacity = 10000

I have created a Hadoop cluster in the following versions: 2.6.0, 2.7.7, 3.2.0 and in each case the same error occurs:

Cygnus logs:

time=2019-05-12T07:39:15.133Z | lvl=INFO | corr=N/A | trans=N/A | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=persistAggregation | msg=com.telefonica.iot.cygnus.sinks.NGSIHDFSSink[1067] : [hdfs-sink] There was some problem with the current endpoint, trying other one. Details: CygnusPersistenceError (IOException). Request error (hdfsServer: Name or service not known). time=2019-05-12T07:39:15.133Z | lvl=ERROR | corr=N/A | trans=N/A | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=processRollbackedBatches | msg=com.telefonica.iot.cygnus.sinks.NGSISink[399] : CygnusPersistenceError. No endpoint was available. Stack trace: [com.telefonica.iot.cygnus.sinks.NGSIHDFSSink.persistAggregation(NGSIHDFSSink.java:1077), com.telefonica.iot.cygnus.sinks.NGSIHDFSSink.persistBatch(NGSIHDFSSink.java:495), com.telefonica.iot.cygnus.sinks.NGSISink.processRollbackedBatches(NGSISink.java:391), com.telefonica.iot.cygnus.sinks.NGSISink.process(NGSISink.java:373), org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67), org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145), java.lang.Thread.run(Thread.java:748)]

cygnus can create the path inside the hadoop, but cygnus can not create the txt file with the measurements

best regard Antonio

fgalan commented 5 years ago

By the error message you get (hdfsServer: Name or service not known) it seems to be some kind of problem with your cluster setup and/or the Cygnus-to-cluster connection.

Maybe some of the following parameters is involved:

cygnus-ngsi.sinks.hdfs-sink.hdfs_host = 10.9.8.29
cygnus-ngsi.sinks.hdfs-sink.hdfs_port = 50070
cygnus-ngsi.sinks.hdfs-sink.hdfs_username = stack
cygnus-ngsi.sinks.hdfs-sink.hdfs_password = stack

Looking to the https://fiware-cygnus.readthedocs.io/en/master/cygnus-ngsi/flume_extensions_catalogue/ngsi_hdfs_sink/index.html, you are not using the cygnus-ngsi.sinks.hdfs-sink.backend.impl parameter, which defaults to rest. But reading documentation about hdfs_port:

14000 if using HttpFS (rest), 50070 if using WebHDFS (rest), 8020 if using the Hadoop API (binary).

So maybe you are using a wrong port.

In addition, it would be a good sanity check to check the WebHDFS/HttpFS API (some basic GET method) from the system running Cygnus in order to check your cluster is ok and is reachable.