keedio / flume-ng-sql-source

Flume Source to import data from SQL Databases
Apache License 2.0
264 stars 164 forks source link

Support for loading records from oracle11g databases #16

Closed NitinKumar94 closed 9 years ago

NitinKumar94 commented 9 years ago

I was unable to connect to my oracle11g database using your latest release. I'm using HDP 2.3 on a 3 node cluster installed using Ambari 2.1.1 even though hibernate engine supports 11g. Are there compatibility issues with the HDP release specifically?

Below is my flume.conf file

Flume agent config

a1.channels.ch1.type = memory a1.sources.sql-source.channels = ch1 a1.channels = ch1 a1.sinks = HDFS

a1.sources = sql-source a1.sources.sql-source.type = org.keedio.flume.source.SQLSource

URL to connect to database

a1.sources.sql-source.connection.url = jdbc:oracle:thin:prasads@10.113.59.5:1521/ora11g

Database connection properties

a1.sources.sql-source.user = prasads a1.sources.sql-source.password = prasads a1.sources.sql-source.table = roc_ai_cdr_base a1.sources.sql-source.database = ora11g

a1.sources.sql-source.columns.to.select = *

Increment column properties

a1.sources.sql-source.incremental.column.name = day_of_year

Increment value is from you want to start taking data from tables (0 will import entire table)

a1.sources.sql-source.incremental.value = 349

Query delay, each configured milisecond the query will be sent

a1.sources.sql-source.run.query.delay=30000

Status file is used to save last readed row

a1.sources.sql-source.status.file.path = /var/lib/flume a1.sources.sql-source.status.file.name = sql-source.status

a1.sinks.HDFS.channel = ch1 a1.sinks.HDFS.type = hdfs a1.sinks.HDFS.hdfs.path = hdfs://node1.analytics.subex:8020/user/ambari-qa/flume/oracle/database/roc_ai_cdr_base a1.sinks.HDFS.hdfs.file.Type = DataStream

And am getting the following error:

2015-09-01 17:48:34,280 (conf-file-poller-0) [WARN - org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.configure(DriverManagerConnectionProviderImpl.java:93)] HHH000402: Using Hibernate built-in connection pool (not for production use!) 2015-09-01 17:48:34,281 (conf-file-poller-0) [INFO - org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.buildCreator(DriverManagerConnectionProviderImpl.java:166)] HHH000401: using driver [null] at URL [jdbc:oracle:thin:prasads@10.113.59.5:1521/ora11g] 2015-09-01 17:48:34,281 (conf-file-poller-0) [INFO - org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.buildCreator(DriverManagerConnectionProviderImpl.java:175)] HHH000046: Connection properties: {user=prasads, password=****} 2015-09-01 17:48:34,281 (conf-file-poller-0) [INFO - org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.buildCreator(DriverManagerConnectionProviderImpl.java:180)] HHH000006: Autocommit mode: false 2015-09-01 17:48:34,282 (conf-file-poller-0) [INFO - org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.configure(DriverManagerConnectionProviderImpl.java:102)] HHH000115: Hibernate connection pool size: 20 (min=1) 2015-09-01 17:48:34,376 (conf-file-poller-0) [ERROR - org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:366)] Source sql-source has been removed due to an error during configuration org.hibernate.exception.JDBCConnectionException: Error calling DriverManager#getConnection at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:132) at org.hibernate.engine.jdbc.connections.internal.BasicConnectionCreator$1$1.convert(BasicConnectionCreator.java:118) at org.hibernate.engine.jdbc.connections.internal.BasicConnectionCreator.convertSqlException(BasicConnectionCreator.java:140) at org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionCreator.makeConnection(DriverManagerConnectionCreator.java:54) at org.hibernate.engine.jdbc.connections.internal.BasicConnectionCreator.createConnection(BasicConnectionCreator.java:75) at org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionProviderImpl.configure(DriverManagerConnectionProviderImpl.java:106) at org.hibernate.boot.registry.internal.StandardServiceRegistryImpl.configureService(StandardServiceRegistryImpl.java:111) at org.hibernate.service.internal.AbstractServiceRegistryImpl.initializeService(AbstractServiceRegistryImpl.java:234) at org.hibernate.service.internal.AbstractServiceRegistryImpl.getService(AbstractServiceRegistryImpl.java:206) at org.hibernate.engine.jdbc.internal.JdbcServicesImpl.buildJdbcConnectionAccess(JdbcServicesImpl.java:260) at org.hibernate.engine.jdbc.internal.JdbcServicesImpl.configure(JdbcServicesImpl.java:94) at org.hibernate.boot.registry.internal.StandardServiceRegistryImpl.configureService(StandardServiceRegistryImpl.java:111) at org.hibernate.service.internal.AbstractServiceRegistryImpl.initializeService(AbstractServiceRegistryImpl.java:234) at org.hibernate.service.internal.AbstractServiceRegistryImpl.getService(AbstractServiceRegistryImpl.java:206) at org.hibernate.cfg.Configuration.buildTypeRegistrations(Configuration.java:1887) at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1845) at org.keedio.flume.source.HibernateHelper.establishSession(HibernateHelper.java:59) at org.keedio.flume.source.SQLSource.configure(SQLSource.java:71) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:331) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.sql.SQLException: No suitable driver found for jdbc:oracle:thin:prasads@10.113.59.5:1521/ora11g at java.sql.DriverManager.getConnection(DriverManager.java:689) at java.sql.DriverManager.getConnection(DriverManager.java:208) at org.hibernate.engine.jdbc.connections.internal.DriverManagerConnectionCreator.makeConnection(DriverManagerConnectionCreator.java:51) ... 25 more

mvalleavila commented 9 years ago

Hello Nitin,

I can see the line:

Caused by: java.sql.SQLException: No suitable driver found for jdbc:oracle:thin:prasads@10.113.59.5:1521/ora11g

Try to download the jdbc driver for oracle 11g and put it into flume plugin libext directory. I think with this the problem should be resolved.

Regards Marcelo

NitinKumar94 commented 9 years ago

Hi Marcelo,

I have the required jdbc driver in /usr/hdp/current/flume-server/plugins.d/lib/source-sql/libext folder and your latest jar "flume-ng-sql-source-1.3-SNAPSHOT.jar" in /usr/hdp/current/flume-server/plugins.d/sql-source/lib

I am still getting the same error. Permissions for plugins.d and all sub directories is 777.

I notice that in the README.md file you have listed under deployment the following instructions

mkdir -p $FLUME_HOME/plugins.d/sql-source/lib $FLUME_HOME/plugins.d/sql-source/lib

Both the paths are the same. Perhaps I am making a mistake in creating the right directories and having the right jars put in them. Could you clear that up?

Thanks! Nitin

mvalleavila commented 9 years ago

Nitin,

Can you check using release 1.3.4 and setting the parameter

hibernate.connection.driver_class = <oracle-driver-class>

to force hibernate using the oracle driver

Regards

NitinKumar94 commented 9 years ago

Hi Marcelo,

I got it running with the 1.3.4 release. I was using ojdbc14.jar previously and shifted to ojdbc6.jar.

Thanks a ton!

mvalleavila commented 9 years ago

Fantastic to hear it works!

NitinKumar94 commented 9 years ago

Hi Marcelo,

I have run into another problem. It seems that when data is being dumped into HDFS, all values are being placed under quotes. This leads to problems later when I want to interpret these values as data types other than string - I get NULL values in my hive tables instead. I believe it is because of the CSVWriter writeAll method that you are using (version 2.3). A newer version is available 3.0+ which has a new method with the following signature:

public void writeAll(List<String[]> allLines, boolean applyQuotesToAll)

This way we can control the values which we want to place under quotes. I tried to build the project with a different dependency in the pom.xml file, but I always end up with a compilation error. I'm kind of a newbie to all this. Could you look into it? Thanks!

Regards, Nitin

prashanttct07 commented 9 years ago

Hi Nitin/Marcelo,

I was supposed to raise this query previously as well, as I am also getting the same issue.

So what I am doing is , I am passing the events/data which I am receiving from flume to spark so when I collect the data at spark I am getting all data as string.

So say if I am passing the data "x.x.x.x." , 80 , "y.y.y.y" , 8080 , "critical" and at spark I am receiving the data like ""x.x.x.x"" , "80" , ""y.y.y.y"" , "8080" , ""critical"" (See quotes are coming twice for string fields)

Hence at my spark application I need to perform string operation to remove the extra quotes from start and end.

So let me know what could be the possible solution for the same as it is coming very costly to perform those string operation at next level.

Hoping for best response for the same.

~Prashant

mvalleavila commented 9 years ago

Nitin/Prashant I Openend new issue #17 for this bug