keedio / flume-ftp-source

FTP network server is source of events for Apache-flume
80 stars 61 forks source link

SFTP source file name filter issue #19

Closed chandsks closed 8 years ago

chandsks commented 8 years ago

Hi,

our client ftp server as has several sets of files. I am trying to select a particular source files. It is ftping all the files irrespective of file name or regular expression i have provided? there there property i can set to get a particular file set? I am wondering if there are ways to get file along with file name?

I am using fallowing flume config file property to filer file name. Agent.sources.sftp1.file.name = WeatherForecast10092015.csv or Agent.sources.sftp1.file.name = WeatherForecast*

Any help is appreciated.

Thanks Chandra

lazaromedina commented 8 years ago

Hi, there is no property called "config file property to filer file name.". If you check Readme: https://github.com/keedio/flume-ftp-source#files-name-that-keeps-track-of-files-and-sizes-processed agent.sources.ftp1.file.name = stands for a file that tracks procesed files. Actually keedio-flume-ftp does not implements such a feature, i.e.: filter files to process. All files allocated in ftp directory and subdirectories wiil be procesed. Regards.

chandsks commented 8 years ago

Thank you. Is there any property that I can set to change source directory (ftp server directory) where I can retrieve the files from? is it strictly ftp user home directory?

Thanks, Chandra

lazaromedina commented 8 years ago

Hi, that's it, strictly ftp user's home directory. There is no such a property. You are welcome to modify source code to your needs. Regards.

chandsks commented 8 years ago

Thank you

On Mon, Jan 18, 2016 at 11:10 PM, Luis Lázaro notifications@github.com wrote:

Hi, that's it, strictly ftp user's home directory. There is no such a property. You are welcome to modify source code to your needs. Regards.

— Reply to this email directly or view it on GitHub https://github.com/keedio/flume-ftp-source/issues/19#issuecomment-172760751 .

Prem7721 commented 7 years ago

Hi, I am getting an error something like this

17/06/15 06:07:09 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data. Exception follows. org.apache.flume.FlumeException: Unable to load source type: org.keedio.flume.source.ftp.source.Source, class: org.keedio.flume.source.ftp.source.Source

But the jar "flume-ftp-source-2.0.8" built out of this gitproject has the Source class. Am I missing something?

lazaromedina commented 7 years ago

Hi,

Best, Luis

Prem7721 commented 7 years ago

Hi,

I am seeing the compiled classes have the Source.class file. user "C:\Users\pkd8548\git\flume-ftp-source\target\classes\org\keedio\flume\source\ftp\source\Source.class" this location

Also I am getting Test Exceptions while Building the Project. So I have "Skipped tests" while to create this Jar.

Flume command I am running :

/usr/bin/flume-ng agent --conf conf --conf-file flume-ng-ftp-source-FTP.conf --classpath /home/pkd8548/flume-ftp-source-2.0.8.jar:/home/pkd8548/flume-ng-sql-source-1.3.7.jar:/home/pkd8548/jsch-0.1.52.jar:/home/pkd8548/commons-net-3.3.jar --name agent -Dflume.root.logger=INFO,console -Xms512m -Xmx1024m

Error I am getting :

17/06/21 03:33:12 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data. Exception follows. org.apache.flume.FlumeException: Unable to load source type: org.keedio.flume.source.ftp.source.Source, class: org.keedio.flume.source.ftp.source.Source at org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:69) at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:42) at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:322) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.keedio.flume.source.ftp.source.Source at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:67)

My Configuration FIle :

Sources Definition for agent "agent"

ACTIVE LIST

agent.sources = ftp1 agent.sinks = k1 agent.channels = ch1

SOURCE IS ftp server

Type of source for ftp sources

agent.sources.ftp1.type = org.keedio.flume.source.ftp.source.Source agent.sources.ftp1.client.source = ftp

Connection properties for ftp server

agent.sources.ftp1.name.server = isbibnf03 agent.sources.ftp1.port = 22

agent.sources.ftp1.user = user

agent.sources.ftp1.password = password agent.sources.ftp1.folder = /path/to/file agent.sources.ftp1.file.name = filename.TXT

Discover delay, each configured milisecond directory will be explored

agent.sources.ftp1.run.discover.delay=5000

Process by lines

agent.sources.ftp1.flushlines = true

agent.sinks.k1.type = file_roll agent.sinks.k1.sink.directory = /destination/path/to/File agent.sinks.k1.sink.rollInterval = 7200

agent.channels.ch1.type = memory agent.channels.ch1.capacity = 10000 agent.channels.ch1.transactionCapacity = 1000

agent.sources.ftp1.channels = ch1

agent.sinks.k1.channel = ch1

lazaromedina commented 7 years ago

Hi, please answer questions:

--classpath /home/pkd8548/flume-ftp-source-2.0.8.jar:/home/pkd8548/flume-ng-sql-source-1.3.7.jar:/home/pkd8548/jsch-0.1.52.jar:/home/pkd8548/commons-net-3.3.jar

Bendoha commented 5 years ago

Hi I am using this configuration :

agent.sources = sftp1 agent.sinks = logger1 agent.channels = mem1

Source

Type - SFTP

agent.sources.sftp1.type = org.keedio.flume.source.ftp.source.Source agent.sources.sftp.client.source = sftp

Source connection properties

agent.sources.sftp1.name.server = agent.sources.sftp1.port = 22 agent.sources.sftp1.user = agent.sources.sftp1.password =

Source transfer properties

agent.sources.sftp1.working.directory = C:/SFTP_Root agent.sources.sftp1.filter.pattern = .+\.txt agent.sources.sftp1.run.discover.delay = 5000 agent.sources.sftp1.file.name = log.txt

Sink

agent.sinks.logger1.type = logger

Channel

agent.channels.mem1.type = memory agent.channels.mem1.capacity = 1000 agent.channels.mem1.transactionCapacity = 100

Bind source and sink to channel

agent.sources.sftp1.channels = mem1 agent.sinks.logger1.channel = mem1

i have this ERROR Could you help me please !

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/04/18 01:07:05 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 19/04/18 01:07:05 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/home/cloudera/flume_workspace/flume-ng-ftp-source-SFTP.conf 19/04/18 01:07:05 INFO conf.FlumeConfiguration: Added sinks: logger1 Agent: agent 19/04/18 01:07:05 INFO conf.FlumeConfiguration: Processing:logger1 19/04/18 01:07:05 INFO conf.FlumeConfiguration: Processing:logger1 19/04/18 01:07:05 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent] 19/04/18 01:07:05 INFO node.AbstractConfigurationProvider: Creating channels 19/04/18 01:07:05 INFO channel.DefaultChannelFactory: Creating instance of channel mem1 type memory 19/04/18 01:07:05 INFO node.AbstractConfigurationProvider: Created channel mem1 19/04/18 01:07:05 INFO source.DefaultSourceFactory: Creating instance of source sftp1, type org.keedio.flume.source.ftp.source.Source 19/04/18 01:07:05 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to load configuration data. Exception follows. org.apache.flume.FlumeException: Unable to load source type: org.keedio.flume.source.ftp.source.Source, class: org.keedio.flume.source.ftp.source.Source at org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:68) at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:42) at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:322) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:101) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:141) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.keedio.flume.source.ftp.source.Source at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.flume.source.DefaultSourceFactory.getClass(DefaultSourceFactory.java:66) ... 11 more