Ben-PH commented 1 year ago

Is your feature request related to a problem? Please describe. There are some parts of the code that do seperate read operations on the ledger DB where they could possibly batch the reads.

There could also be a benefit from pinning the data, reducing memory allocations.

Without having benchmarks, it's tough to say how valuable time spent doing this is.

Describe the solution you'd like One example is this code snippet:

        // check for address existence
        if !self.entry_exists(&creator_address) {
            return Err(ExecutionError::RuntimeError(format!(
                "could not create SC address {}: creator address {} does not exist",
                addr, creator_address
            )));
        }

        // check that we don't collide with existing address
        if self.entry_exists(&addr) {
            return Err(ExecutionError::RuntimeError(format!(
                "could not create SC address {}: target address already exists",
                addr
            )));
        }

could instead be written something like this:

let mut things = db.batched_multi_get_cf(..., (&creator_address, &addr), true).expect(...).iter();
// ...check the two results

Additional context Add any other context or screenshots about the feature request here.

Ben-PH commented 1 year ago

Here is an e.g. prototype I started

// TODO: Make this more in line with the rocks-db interface
//       Eg, have a HashMap<Address, 3-tuple>, options, etc...
pub struct BatchGetter<'a> {
    address: &'a Address,
    balance: bool,
    bytecode: bool,
    datastore: Option<Vec<u8>>,
}

impl<'a> BatchGetter<'a> {
    pub fn new(address: &'a Address, ty: LedgerSubEntry) -> Self {
        let res = Self {
            address,
            balance: Default::default(),
            bytecode: Default::default(),
            datastore: Default::default(),
        };
        res.with_sub_entry(ty)
    }

    pub fn with_sub_entry(mut self, ty: LedgerSubEntry) -> Self {
        match ty {
            LedgerSubEntry::Balance => self.balance = true,
            LedgerSubEntry::Bytecode => self.bytecode = true,
            LedgerSubEntry::Datastore(hash) => self.datastore = Some(hash),
        };
        self
    }
    pub fn with_balance(mut self) -> Self {
        self.balance = true;
        self
    }
    pub fn with_bytecode(mut self) -> Self {
        self.bytecode = true;
        self
    }
    pub fn with_datastore(mut self, hash: Vec<u8>) -> Self {
        self.datastore = Some(hash);
        self
    }
}

Ben-PH commented 1 year ago

@Eitu33 What are your thoughts?

Eitu33 commented 1 year ago

@Ben-PH Batching is nice but using RocksDB iterators would be better here. If we take get_datastore_keys for example:

/// Get every key of the datastore for a given address.
///
/// # Returns
/// A `BTreeSet` of the datastore keys
pub fn get_datastore_keys(&self, addr: &Address) -> BTreeSet<Vec<u8>> {
    let handle = self.db.cf_handle(LEDGER_CF).expect(CF_ERROR);

    let mut opt = ReadOptions::default();
    opt.set_iterate_upper_bound(end_prefix(data_prefix!(addr)).unwrap());

    self.db
        .iterator_cf_opt(
            handle,
            opt,
            IteratorMode::From(data_prefix!(addr), Direction::Forward),
        )
        .flatten()
        .map(|(key, _)| key.split_at(ADDRESS_SIZE_BYTES + 1).1.to_vec())
        .collect()
}

This function creates an iterator over all the keys that start with the datastore prefix of the given address. A datastore key being addr + datastore_identifier_byte + key the datastore prefix is addr + datastore_identifier_byte. Balance and bytecode follow the same convention in a simpler way since they represent only one disk ledger entry, which results in their key being addr + balance_identifier_byte and addr + bytecode_identifier_byte + key.

What I would imagine is a function that retrieves all the sub entries of an address by creating an iterator with the range [addr..end_prefix(addr)].

Also something I noticed a while back is that RocksDB rust crate has the possibility to specify the range directly with set_iterate_range, which is better than what was made at the time (see code snippet above). After using this option we would just have to set the IteratorMode to Start since the instantiated iterator will only cover the address entry.

Ben-PH commented 1 year ago

massalabs / massa

Use the `get_multi` with rocksdb and other optimisations. #3437

3445