This page is a summary to keep the track of Hadoop related projects, and relevant projects around Big Data scene focused on the open source, free software environment.
Apache Crail is a fast multi-tiered distributed storage system designed from ground up for high-performance network and storage hardware.
It marks the backbone of the Crail I/O architecture, which is described in more detail on crail.incubator.apache.org.
The unique features of Crail include:
Zero-copy network access from userspace
Integration of multiple storage tiers such DRAM, flash and disaggregated shared storage
Ultra-low latencies for both meta data and data operations. For instance: opening, reading and closing a small file residing in the distributed DRAM tier less than 10 microseconds, which is in the same ballpark as some of the fastest RDMA-based key/value stores
High-performance sequential read/write operations: For instance: read operations on large files residing in the distributed DRAM tier are typically limited only by the performance of the network
Very low CPU consumption: a single core sharing both application and file system client can drive sequential read/write operations at the speed of up to 100Gbps and more
Asynchronous API leveraging the asynchronous nature of RDMA-based networking hardware
Extensible plugin architecture: new storage tiers tailored to specific hardware can be added easily
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
These two projects are quite active right now.
Crail
Apache Crail is a fast multi-tiered distributed storage system designed from ground up for high-performance network and storage hardware. It marks the backbone of the Crail I/O architecture, which is described in more detail on crail.incubator.apache.org. The unique features of Crail include:
Arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.