telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
64 stars 105 forks source link

File channels #1807

Open manucarrace opened 4 years ago

manucarrace commented 4 years ago

Since many sinks don't have specific performance requirements, (many of them use batches), it is interesting to investigate the use of file channels and its implications.

IvanHdzC commented 4 years ago

In order to enable apache flume's file channel, it's necesary to set the property type to file on the agent conf.

cygnusagent.channels.hdfs.type = file

Also, it's necesary to add a new property in order for it to work. By default this property stores data into ~/.flume/file-channel/checkpoint but it might be possible that the executing user has no permission to write on that path.

cygnusagent.channels.hdfs.checkpointDir = cygnusPath/something

There are a lot of properties that can be added in order to customize the file chanel capability.

According to apache flume docs this are:

Property Name Default Description  
type The component type name, needs to be file.
checkpointDir ~/.flume/file-channel/checkpoint The directory where checkpoint file will be stored
useDualCheckpoints false Backup the checkpoint. If this is set to true, backupCheckpointDir must be set
backupCheckpointDir The directory where the checkpoint is backed up to. This directory must not be the same as the data directories or the checkpoint directory
dataDirs ~/.flume/file-channel/data Comma separated list of directories for storing log files. Using multiple directories on separate disks can improve file channel peformance
transactionCapacity 10000 The maximum size of transaction supported by the channel
checkpointInterval 30000 Amount of time (in millis) between checkpoints
maxFileSize 2146435071 Max size (in bytes) of a single log file
minimumRequiredSpace 524288000 Minimum Required free space (in bytes). To avoid data corruption, File Channel stops accepting take/put requests when free space drops below this value
capacity 1000000 Maximum capacity of the channel
keep-alive 3 Amount of time (in sec) to wait for a put operation
use-log-replay-v1 false Expert: Use old replay logic
use-fast-replay false Expert: Replay without using queue
checkpointOnClose true Controls if a checkpoint is created when the channel is closed. Creating a checkpoint on close speeds up subsequent startup of the file channel by avoiding replay.
encryption.activeKey Key name used to encrypt new data
encryption.cipherProvider Cipher provider type, supported types: AESCTRNOPADDING
encryption.keyProvider Key provider type, supported types: JCEKSFILE
encryption.keyProvider.keyStoreFile Path to the keystore file
encrpytion.keyProvider.keyStorePasswordFile Path to the keystore password file
encryption.keyProvider.keys List of all keys (e.g. history of the activeKey setting)
encyption.keyProvider.keys.*.passwordFile Path to the optional key password file