intel / isa-l

Intelligent Storage Acceleration Library
Other
942 stars 299 forks source link

Does the ISAL library have an API compatible with Rocksdb CRC32C #292

Open damoncui1993 opened 1 month ago

damoncui1993 commented 1 month ago

My business previously used CRC32C from the Rocksdb library for data validation, but I found that the ISAL library has better performance. Therefore, I am considering using this library for speed improvement. However, I found that the calculated value of ISAL's CRC32_gzip-refl function is inconsistent with the original CRC32C result, which makes the business unusable. Do ISAL have an API that is compatible with Rocksdb's CRC32C function?

damoncui1993 commented 1 month ago
#include <boost/crc.hpp>
#include <iostream>
#include <vector>
#include <chrono>
#include <cstdlib>
#include <ctime>  
#include "isa-l.h" 
#include "base/crc32c.h"
#include <memory>  // Add this line

// 使用 ISA-L 的 crc32_gzip_refl 计算 CRC32
uint32_t crc32_gzip_refl(uint32_t crc, const void *buf, size_t size) {
    return crc32_gzip_refl(crc, buf, size);
}

int main() {
    std::srand(std::time(nullptr));
    std::vector<uint8_t> data(100000000); 
    for (auto& byte : data) {
        byte = std::rand() % 256; 
    }

    //  Rocksdb CRC32
    auto start_custom = std::chrono::high_resolution_clock::now();
    uint32_t crc_custom = base::crc32c::Extend(0, reinterpret_cast<const char*>(data.data()), data.size());
    auto end_custom = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> elapsed_custom = end_custom - start_custom;
    std::cout << "Rocksdb CRC32: " << crc_custom << ", Time: " << elapsed_custom.count() << " ms" << std::endl;

    // ISA-L crc32_gzip
    auto start_isal_gzip = std::chrono::high_resolution_clock::now();
    uint32_t crc_isal_gzip = crc32_gzip_refl(0, data.data(), data.size());
    auto end_isal_gzip = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> elapsed_isal_gzip = end_isal_gzip - start_isal_gzip;
    std::cout << "ISA-L CRC32_GZIP: " << crc_isal_gzip << ", Time: " << elapsed_isal_gzip.count() << " ms" << std::endl;

    return 0;
}

Result: Rocksdb CRC32: 500236219, Time: 0.125811 ms ISA-L CRC32_GZIP: 517025312, Time: 0.019541 ms

rhpvorderman commented 1 month ago

Did you also check crc32_ieee?

rhpvorderman commented 1 month ago

And does it have to be backwards-compatible? Otherwise I recommend using XXHash for data validation. (Note: I am not an ISA-L maintainer, just a compression enthusiast.)

damoncui1993 commented 1 month ago

And does it have to be backwards-compatible? Otherwise I recommend using XXHash for data validation. (Note: I am not an ISA-L maintainer, just a compression enthusiast.)

Thank you very much for your suggestions, but unfortunately, our business has recorded the CRC data of existing data and requires a new CRC algorithm that is fully compatible with the former, otherwise it will result in data mismatch,

pablodelara commented 1 month ago

Hi @damoncui1993. I think you need to use the crc32_iscsi() function, but you'll need to invert the initial value (if 0, pass "0xFFFFFFFF) and the output value: uint32_t res_crc = ~crc_iscsi(buf, len, 0xFFFFFFFF);

damoncui1993 commented 1 month ago

Hi @damoncui1993. I think you need to use the crc32_iscsi() function, but you'll need to invert the initial value (if 0, pass "0xFFFFFFFF) and the output value: uint32_t res_crc = ~crc_iscsi(buf, len, 0xFFFFFFFF);

Thank you very much for your suggestion. This function works perfectly!! ):