WIP PR: This is functionally complete. All tests pass. However, it was mostly a quick-and-dirty 1:1 extraction from Nim, so now time permitting, the focus will be on making sure that things are as simple as they can reasonably be given that this now lives here inside this gem.
This PR is the result of extracting extra WebHDFS functionality from Nim so that other projects may benefit. In particular it integrates retry logic that aims to gracefully handle certain errors and conditions such as high availability failover, as well as adds or augments certain HDFS operations.
Tests
Tests were also migrated over and are all passing. They were written in rspec, while this project's original tests were written with the now-deprecated test_unit. We end up using both now to avoid rewriting things, but the tests seemed really trivial so they may be migrated over to keep things consistent and simpler.
[ ] Previously the code would throw NeutronicHelper::FileNotFoundError. There's already a WebHDFS::FileNotFoundError, so those instances were changed to it.
Care needs to be taken here. Parts of Nim rescue NeutronicHelper::FileNotFoundError to provide certain functionality, such as creating things if they don't exist (IIUC), so WebHDFS::FileNotFoundError needs to be rescued there too.
The upshot would be that explicit configuration of the JMX API endpoint and the default namenode would become optional.
[ ] It appears that lib/webhdfs/factual/api_connection.rb is duplicated across many projects. It can be factored out to a separate gem, but that gem would have to be open source or we'll have to duplicate it here anyway as long as this project is open source.
WIP PR: This is functionally complete. All tests pass. However, it was mostly a quick-and-dirty 1:1 extraction from Nim, so now time permitting, the focus will be on making sure that things are as simple as they can reasonably be given that this now lives here inside this gem.
This PR is the result of extracting extra WebHDFS functionality from Nim so that other projects may benefit. In particular it integrates retry logic that aims to gracefully handle certain errors and conditions such as high availability failover, as well as adds or augments certain HDFS operations.
Tests
Tests were also migrated over and are all passing. They were written in rspec, while this project's original tests were written with the now-deprecated test_unit. We end up using both now to avoid rewriting things, but the tests seemed really trivial so they may be migrated over to keep things consistent and simpler.
Tests expect these environment variables:
API_HOST
DEFAULT_NAMENODE
TEST_DIR
KERBEROS
KEYTAB_PATH
For example:
Tasks
[ ] Refactor.
[ ] Audit and remove FIXME and TODO comments.
[ ] Previously the code would throw
NeutronicHelper::FileNotFoundError
. There's already aWebHDFS::FileNotFoundError
, so those instances were changed to it.Care needs to be taken here. Parts of Nim
rescue NeutronicHelper::FileNotFoundError
to provide certain functionality, such as creating things if they don't exist (IIUC), soWebHDFS::FileNotFoundError
needs to be rescued there too.Maybe there's a better way to do this?
[ ] (Optional) Port existing tests to rspec.
[ ] Maybe we should read the namenodes from
hdfs-site.xml
. https://github.com/Factual/helpdesk/issues/3543#issuecomment-453630284We would need to ensure that we have the configuration files. They can be obtained with:
The upshot would be that explicit configuration of the JMX API endpoint and the default namenode would become optional.
[ ] It appears that lib/webhdfs/factual/api_connection.rb is duplicated across many projects. It can be factored out to a separate gem, but that gem would have to be open source or we'll have to duplicate it here anyway as long as this project is open source.