keedio / flume-ng-sql-source

Flume Source to import data from SQL Databases
Apache License 2.0
264 stars 164 forks source link

String values of database are being placed with quotes twice #17

Closed mvalleavila closed 7 years ago

mvalleavila commented 9 years ago

From issue #16

NitinKumar94 says:

Hi Marcelo,

I have run into another problem. It seems that when data is being dumped into HDFS, all values are being placed under quotes. This leads to problems later when I want to interpret these values as data types other than string - I get NULL values in my hive tables instead. I believe it is because of the CSVWriter writeAll method that you are using (version 2.3). A newer version is available 3.0+ which has a new method with the following signature:

public void writeAll(List allLines, boolean applyQuotesToAll)

This way we can control the values which we want to place under quotes. I tried to build the project with a different dependency in the pom.xml file, but I always end up with a compilation error. I'm kind of a newbie to all this. Could you look into it? Thanks!

Regards, Nitin

Prashant-Pal says:

Hi Nitin/Marcelo,

I was supposed to raise this query previously as well, as I am also getting the same issue.

So what I am doing is , I am passing the events/data which I am receiving from flume to spark so when I collect the data at spark I am getting all data as string.

So say if I am passing the data "x.x.x.x." , 80 , "y.y.y.y" , 8080 , "critical" and at spark I am receiving the data like ""x.x.x.x"" , "80" , ""y.y.y.y"" , "8080" , ""critical"" (See quotes are coming twice for string fields)

Hence at my spark application I need to perform string operation to remove the extra quotes from start and end.

So let me know what could be the possible solution for the same as it is coming very costly to perform those string operation at next level.

Hoping for best response for the same.

~Prashant

mvalleavila commented 9 years ago

CSVWriter version 3.0+ will be tested to solve this behaviour

rishit2016 commented 7 years ago

hi, is this issue resolved as i am also getting all data under double quotes. what is to be done? please suggest. i am using flume-ng-source-1.3.7.jar

thanks, Rishit Shah

rishit2016 commented 7 years ago

agent1.sources = sql-source

agent1.sources.sql-source.type = org.keedio.flume.source.SQLSource agent1.sources.sql-source.channels = ch1

URL to connect to database (currently only mysql is supported)

agent1.sources.sql-source.connection.url = jdbc:oracle:thin:@10.9.64.12:1521:ODSPROD

Database connection properties

agent1.sources.sql-source.user = mis_user agent1.sources.sql-source.password = abcd_1234 agent1.sources.sql-source.table = ods_user.fi_ratelist_table

agent1.sources.sql-source.database = ods_user

agent1.sources.sql-source.columns.to.select = *

Increment column properties

agent1.sources.sql-source.incremental.column.name = id

Increment value is from you want to start taking data from tables (0 will import entire table)

agent1.sources.sql-source.incremental.value = 0

Query delay, each configured milisecond the query will be sent

agent1.sources.sql-source.run.query.delay=10000

Status file is used to save last readed row

agent1.sources.sql-source.status.file.path = /var/lib/flume agent1.sources.sql-source.status.file.name = sql-source.status

agent1.channels.ch1.type = memory agent1.channels.ch1.capacity = 1000000

agent1.sinks.HDFS.channel = ch1 agent1.sinks.HDFS.type = hdfs agent1.sinks.HDFS.hdfs.path = hdfs://10.0.66.240:8020/user/dev_Rht/flume/ods agent1.sinks.HDFS.hdfs.file.Type = DataStream

rishit2016 commented 7 years ago

kindly reply.

mvalleavila commented 7 years ago

Added property enclose.by.quotes to version 1.4.3 Setting this property to false will solve the problem