Open manucarrace opened 4 years ago
In order to enable apache flume's file channel, it's necesary to set the property type
to file
on the agent conf.
cygnusagent.channels.hdfs.type = file
Also, it's necesary to add a new property in order for it to work. By default this property stores data into ~/.flume/file-channel/checkpoint but it might be possible that the executing user has no permission to write on that path.
cygnusagent.channels.hdfs.checkpointDir = cygnusPath/something
There are a lot of properties that can be added in order to customize the file chanel capability.
According to apache flume docs this are:
Property Name Default | Description | |
---|---|---|
type | – | The component type name, needs to be file. |
checkpointDir | ~/.flume/file-channel/checkpoint | The directory where checkpoint file will be stored |
useDualCheckpoints | false | Backup the checkpoint. If this is set to true, backupCheckpointDir must be set |
backupCheckpointDir | – | The directory where the checkpoint is backed up to. This directory must not be the same as the data directories or the checkpoint directory |
dataDirs | ~/.flume/file-channel/data | Comma separated list of directories for storing log files. Using multiple directories on separate disks can improve file channel peformance |
transactionCapacity | 10000 | The maximum size of transaction supported by the channel |
checkpointInterval | 30000 | Amount of time (in millis) between checkpoints |
maxFileSize | 2146435071 | Max size (in bytes) of a single log file |
minimumRequiredSpace | 524288000 | Minimum Required free space (in bytes). To avoid data corruption, File Channel stops accepting take/put requests when free space drops below this value |
capacity | 1000000 | Maximum capacity of the channel |
keep-alive | 3 | Amount of time (in sec) to wait for a put operation |
use-log-replay-v1 | false | Expert: Use old replay logic |
use-fast-replay | false | Expert: Replay without using queue |
checkpointOnClose | true | Controls if a checkpoint is created when the channel is closed. Creating a checkpoint on close speeds up subsequent startup of the file channel by avoiding replay. |
encryption.activeKey | – | Key name used to encrypt new data |
encryption.cipherProvider | – | Cipher provider type, supported types: AESCTRNOPADDING |
encryption.keyProvider | – | Key provider type, supported types: JCEKSFILE |
encryption.keyProvider.keyStoreFile | – | Path to the keystore file |
encrpytion.keyProvider.keyStorePasswordFile | – | Path to the keystore password file |
encryption.keyProvider.keys | – | List of all keys (e.g. history of the activeKey setting) |
encyption.keyProvider.keys.*.passwordFile | – | Path to the optional key password file |
Since many sinks don't have specific performance requirements, (many of them use batches), it is interesting to investigate the use of file channels and its implications.