optimizing memory usage

mariocynicys commented 1 year ago

Just by loading the minimal necessary data plus dropping the last_known_blocks we get a huge reduction (2GB -> 1.2GB for 2.2M appointments*). We can even do better and further reduce the data stored in memory (dropping user_ids and potentially locators).

^{* This is on MacOS. On linux we get to as low as 1.3G but only 1.0G is actually being used, we know that because if we call malloc_trim we get to 1.0G}

Fixes #224

sr-gi commented 1 year ago

We can even do better and further reduce the data stored in memory (dropping user_ids and potentially locators).

I thought you were going to pack all mem optimizations in the same PR. Are you planning on a follow-up for that?

mariocynicys commented 1 year ago

I thought you were going to pack all mem optimizations in the same PR. Are you planning on a follow-up for that?

Nah, will follow-up in this PR.

Polling the least amount of data already solved the iter/non-iter issue since it only appears when a hashmap has a dynamic sized field (the encrypted_blob in our case). And not really in favor of playing with trimming and stuff since it's not a general solution. What's left is:

streaming reads from the DB (shouldn't introduce any issues, but haven't looked into it yet).

Not sure if there are any trivial/non-structural opts left after that.

The next set of optimizations for watcher I think are:

removing user_id from appointment summary (64 byte reduction) -> will need to call the DB for user_id when needed.
squeezing the locator_uuid_map to only a locator set -> will need to call the DB for uuids if a locator is matched on a block.
removing the locator set all together and relying on db calls in sql transactions for each block as discussed in discord + stream the db reads with the breach handling.

I will need to review the watcher and see what roles does these two hashmaps play so to understand how would reducing/eliminating them would affect the tower CPU-wise in a normal case (not so many breaches).

Will also need to check similar memory optimization options for the responder and gatekeeper.

mariocynicys commented 1 year ago

After a very very long fight with lifetimes, boxes, dyns and everything I was able to assemble this f12cb90.

It's working as intended as a way of steaming data whilst hiding the DBM specific things, but the design isn't good at all:

QueryIterator isn't actually an iterator, you need to call .iter() to get the real iterator: if you try to make it an iterator, you will end up reimplementing rusqlite all over, because you need to replace the Rows & Statement structs they offer.
You need to annotate QueryIterator's params generic type which shouldn't be relevant at all to the caller: because stmt.query_map(params, ... needs params to be of known size at compile time, thus can't be placed in a Box<dyn Params>.

The issue lies in DBM methods constructing a Statement which dies after the method ends, but since we need an iterator and don't want to collect just yet, we want to return Rows which is bound to the life time of that Statement so then we will have to return the statement as well, which is what the above mentioned commit does. https://github.com/rusqlite/rusqlite/issues/1265 should solve this if it were to be implemented.

That said, I might be complicating the issue and could be solved in another simpler way, so posting here to have a review on this.

Meanwhile I am moving on to other possible memory optimization options.

mariocynicys commented 1 year ago

Happy you find it not that terrible :joy: Regrading Params, I wanted this to be general over any DB query. I think an iterator gonna also be useful for processing a newly connected blocks breaches without collecting all the appointments in memory at one time.

sr-gi commented 1 year ago

Happy you find it not that terrible 😂

It was you calling it not good at all, I'm just smoothing it out a bit lol

mariocynicys commented 1 year ago

DB Iterator stuff are backed up in https://github.com/mariocynicys/rust-teos/tree/dbm-iterator

mariocynicys commented 1 year ago

After this change, reached 631MiB after trimming using the 1G DB.

Some tests needed to adapt because they were breaking the recipe for UUID = UserId + Locator, for example Watcher::store_appointment was sometimes passed some uuids that didn't link to the appointment being stored (was generated using generate_uuid). This was problematic because if we store locators in the gatekeepers UserInfo and want to get the uuid back for some locator it wont match. And I think causes a similar problem after the absence of a uuid -> locator map (ie. Watcher::appointments) but don't recall the hows. But these issues where in tests only anyways.

Also some methods might need to have its signatures changed a little bit to be less confusing and less error prone. For example, GateKeeper::add_update_appointment, it accepts (user id, locator, extended appointment), one can get the user id and locator from the extended appointment already, so there is actually a replication of passed data here, and this avoids the us supplying wrong user id and locator and avoid confusion of what these parameter actually should be and whether they should match the fields in the extended appointment or not.

Another thing that I think we should change is the excessive use of hashsets and hashmaps in functions. Hash maps and sets are more expensive to insert in and use more memory than a vector. In many cases we pass a hashmaps to a function just to iterate on it (never using the mapping functionality it offers). Hashmaps could be replaced with a Vec<(key, val)>.

I pushed that commit 72b384d to get a concept ack, but I think will need to do some refactoring & renaming. @sr-gi

mariocynicys commented 1 year ago

Let me also dump some notes I had written xD.

Trying to get rid of Watcher::locator_uuid_map and Watcher::appointments.

1- For locator_uuid_map: locator_uuid_map is used to keep uuids for each locator. On a breach for locator X, you will need to load all the uuids associated with it and load appointments for these uuids (load_appointment(uuid)) and try to decrypt them and broadcast.

This should be replaced with:

Get all the locators found in a newly mined block.
Do a DB tx looking for all the UUIDs for these locators.
Do all of this in an iteratable way to avoid storing any intermediaries in memory for no reason.

2- For appointments: appointments hashmap maps from a uuid to its 2 parents (user_id & locator).

We extract locator from a uuid when a client calls get_subscription_info. We map from UUIDs from UserInfo from the gatekeeper to Locators using the appointments hashmap, this can be avoided if we store locators instead of uuids in UserInfo. We can always recover the uuids from the locators since we know the user_id for that UserInfo, and we also save a 20% in size (locators are 16 bytes while uuids are 20 bytes).

We extract user_id from a uuid when a new block is connected (in filtered_block_connected), user_ids are used while trackers for breaches (why store user_id for a tracker?), user_ids are also used to instruct the gatekeeper which user to update after broadcasting an appointment with uuid X like returning the users' slots and so on. As an alternative, we can get the user_id from the database, when a new block is connected and we pull relevant penalty TXs for broadcasting, we can pull the user_signature as well which we can recover the user_id from. And everything should be done in an iteratable fashion.

That latest commit works on point 2, I will try to work on point 1 without the iteratable fashion mentioned and then adapt it once we can iterate over DB queries in a nice manner.

sr-gi commented 1 year ago

Alright, I think you are on the right path. I'm commenting on your notes, but haven't checked 72b384d (lmk if you'd like me to at this stage).

After this change, reached 631MiB after trimming using the 1G DB.

Some tests needed to adapt because they were breaking the recipe for UUID = UserId + Locator, for example Watcher::store_appointment was sometimes passed some uuids that didn't link to the appointment being stored (was generated using generate_uuid). This was problematic because if we store locators in the gatekeepers UserInfo and want to get the uuid back for some locator it wont match. And I think causes a similar problem after the absence of a uuid -> locator map (ie. Watcher::appointments) but don't recall the hows. But these issues where in tests only anyways.

Yeah, that should only affect tests. In the test suite the uuid recipe was not being followed because data was being generated sort of randomly, but that should be relatively easy to patch.

Also some methods might need to have its signatures changed a little bit to be less confusing and less error prone. For example, GateKeeper::add_update_appointment, it accepts (user id, locator, extended appointment), one can get the user id and locator from the extended appointment already, so there is actually a replication of passed data here, and this avoids the us supplying wrong user id and locator and avoid confusion of what these parameter actually should be and whether they should match the fields in the extended appointment or not.

I partially agree with this. With respect to the user_id, I think we can simplify it given, as you mentioned, one is literally part of the other (and this indeed applies to multiple methods). With respect to the UUID, I'm not that sure given this will trigger a cascade of recomputing this same data. Take Watcher::add_appointment for instance. Here, after computing the UUID, we call:

self.responder.has_tracker
log (using UUID)
Gatekeeper::add_update_appointment
Watcher::store_triggered_appointment
Watcher::store_appointment

The two former need UUID but don't get ExtendedAppointment. The three latter need UUID and also receive ExtendedAppointment. Just for this method will be creating the UUID four times (and I'm not counting the calls that happen inside those methods, the store_X methods also call the DBM, which also receives UUID and ExtendedAppointment).

Another thing that I think we should change is the excessive use of hashsets and hashmaps in functions. Hash maps and sets are more expensive to insert in and use more memory than a vector. In many cases we pass a hashmaps to a function just to iterate on it (never using the mapping functionality it offers). Hashmaps could be replaced with a Vec<(key, val)>.

I agree as long as collecting the map doesn't involve ending up having worse performance. Same applies to sets. The reasoning behind sets instead of vectors is that the former allow item deletion by identifier, while the latter doesn't. In a nutshell, if we're only iterating it may be worth, but if we need addition/deletion it may be trickier. Actually, if we only want to iterate, wouldn't an iterator be an option? Also, are we passing copies of the maps/sets or references?

We extract locator from a uuid when a client calls get_subscription_info

The other way around, or am I missing something? The user requests a given locator and we compute the UUID based on his UserId and the requested Locator.

We map from UUIDs from UserInfo from the gatekeeper to Locators using the appointments hashmap, this can be avoided if we store locators instead of uuids in UserInfo. We can always recover the uuids from the locators since we know the user_id for that UserInfo, and we also save a 20% in size (locators are 16 bytes while uuids are 20 bytes).

I barely remember this now, but I think the reason why UUIDs were stored was so, on deletion, Gatekeeper data could be mapped to Watcher and Responder data. We can indeed re-compute the UUID based on the locator and the user_id (that's why the UUID is created that way, so we could serve queries without having to do a reverse lookup), so I think you may be right here.

We extract user_id from a uuid when a new block is connected (in filtered_block_connected), user_ids are used while trackers for breaches (why store user_id for a tracker?)

Same reasoning I think, so we can delete the corresponding data from Gatekeeper and Watcher/Responder when needed. But again, I'm talking from the top of my mind, would need to review a change of this to see if it makes sense.

mariocynicys commented 1 year ago

With respect to the UUID, I'm not that sure given this will trigger a cascade of recomputing this same data.

That's true, we can keep it taking UUID but be cautious later in the tests not to provide a random non-related UUID. Or we can embed the UUID inside the extended appointment.

Actually, if we only want to iterate, wouldn't an iterator be an option? Also, are we passing copies of the maps/sets or references?

Yup iterator would be the best we can do if we nail having converting the calls reacting to block_connected being stream-able. We pass mostly/everywhere? references, but still some of these sets and maps not being used anytime for mapping and we can simplify them to vecs instead (& refs to vecs).

The other way around, or am I missing something? The user requests a given locator and we compute the UUID based on his UserId and the requested Locator.

We extract locator from a uuid when a client calls get_subscription_info

When a client asks for their subscription info, they wanna know the locators they have given out and not the UUIDs (UUIDs is tower-implementation specific after all). The thing is we store a user's appointments in terms of UUIDs inside UserInfo and we want to convert them to locators for get_subscription_info response. There is not going back from UUID -> Locator (without a map) so we could have stored Locators in UserInfo instead.

get_subscription_info is the message that returns subscription info to the user including all the locators they have sent out to us.

common_msgs::GetSubscriptionInfoResponse {
  available_slots: subscription_info.available_slots,
  subscription_expiry: subscription_info.subscription_expiry,
  locators: locators.iter().map(|x| x.to_vec()).collect(),
}

I think you confused it with get_appointment.

but haven't checked 72b384d (lmk if you'd like me to at this stage).

Nah, ACKing these comments were enough. Will clean this commit, do some renamings and some refactoring for the tests.

sr-gi commented 1 year ago

The other way around, or am I missing something? The user requests a given locator and we compute the UUID based on his UserId and the requested Locator.

We extract locator from a uuid when a client calls get_subscription_info

When a client asks for their subscription info, they wanna know the locators they have given out and not the UUIDs (UUIDs is tower-implementation specific after all). The thing is we store a user's appointments in terms of UUIDs inside UserInfo and we want to convert them to locators for get_subscription_info response. There is not going back from UUID -> Locator (without a map) so we could have stored Locators in UserInfo instead.

get_subscription_info is the message that returns subscription info to the user including all the locators they have sent out to us.
common_msgs::GetSubscriptionInfoResponse {
  available_slots: subscription_info.available_slots,
  subscription_expiry: subscription_info.subscription_expiry,
  locators: locators.iter().map(|x| x.to_vec()).collect(),
}
I think you confused it with get_appointment.

I was indeed thinking about single appointment requests 😅

mariocynicys commented 1 year ago

Current mem usage: 69M Which is basically the locator_cache & tx_index & carrier data & in-memory user info in the gatekeeper.

The last 4 commits should have their tests adapted and squashed into one commit. Leaving them for now for an easy linear review.

Does they make sense in terms of how much they query the DB (bottle necks)? @sr-gi

mariocynicys commented 1 year ago

Current mem usage: 69M Which is basically the locator_cache & tx_index & carrier data & in-memory user info in the gatekeeper.

Also need to mention the IO implications of this refactor. Most of the IO are reads triggered each block by batch_check_locators_exist, but other methods' IO will arise when breaches are found (like pulling encrypted penalties from the DB) but those aren't new.

111 calls to batch_check_locators_exist made around ~27.6GiB of IO reads (~255MiB per block).
No extra writes.

Other affected reads:

get_subscription_info: Pulls all the user's locators from the DB, reads depend on how big the user is.

The 111 calls to batch_check_locators_exist I based those measurements on, all had no breaches found. Thus the 255MiB is solely the search operation and not that some data is read and returned. I think the search will be more efficient if we have an index over the locators in the appointments table.

mariocynicys commented 1 year ago

The 111 calls to batch_check_locators_exist I based those measurements on, all had no breaches found. Thus the 255MiB is solely the search operation and not that some data is read and returned. I think the search will be more efficient if we have an index over the locators in the appointments table.

After applying CREATE INDEX IF NOT EXISTS locators_index ON appointments (locator). 266 calls to batch_check_locators_exist made 786MiB (~3MiB per block).

sr-gi commented 1 year ago

The 111 calls to batch_check_locators_exist I based those measurements on, all had no breaches found. Thus the 255MiB is solely the search operation and not that some data is read and returned. I think the search will be more efficient if we have an index over the locators in the appointments table.

After applying CREATE INDEX IF NOT EXISTS locators_index ON appointments (locator). 266 calls to batch_check_locators_exist made 786MiB (~3MiB per block).

Wow, what a reduction, that nice :)

mariocynicys commented 12 months ago

The current rework of the responder is actually equivalent to how things were previously before the memory opt stuff (but without ConfirmationStatus::ReorgedOut).

I believe it still needs some reworking to keep tracking reorged disputes for longer time (+ actually track reorged disputes for non-reoged penalties), but dunno whether they should be in a follow-up since this one has grown a lot. Thoughts @sr-gi ?

mariocynicys commented 11 months ago

I think this makes sense, but it is starting to get to the point where a cleanup would be helpful, especially to be able to properly review each commit individually.

Ummm, I thought we can only do so many optimization at the first and relied on some in-memory structs being there. Incrementally, I realized we can remove these as well, and updated on old commits, so it's a bit confusing not having done all of them in a single commit.

I would suggest squash reviewing all the commits (- the test fixing one & the merge one) at once, as separating each of them into a standalone commit will be a very nasty rebase (and probably will end up squashing most/all of them together).

sr-gi commented 11 months ago

I think this makes sense, but it is starting to get to the point where a cleanup would be helpful, especially to be able to properly review each commit individually.

Ummm, I thought we can only do so many optimization at the first and relied on some in-memory structs being there. Incrementally, I realized we can remove these as well, and updated on old commits, so it's a bit confusing not having done all of them in a single commit.

I would suggest squash reviewing all the commits (- the test fixing one & the merge one) at once, as separating each of them into a standalone commit will be a very nasty rebase (and probably will end up squashing most/all of them together).

Fair, as long as there is a cleanup of old comments and suggestions are addressed.

mariocynicys commented 11 months ago

Now each of these commits is reviewable on it's own. Will squash 9150458 into 4a922df later so that 4a922df could be easily reviewed.

Will merge with master after 4a922df & 9150458 are squashed together.

mariocynicys commented 10 months ago

We should be tracking things that are pending so we do not forget in the followups

I have typed the issues (one for add_appointment atomicity issues & one for DB persistence of tracker status) but will wait to add code references on them from this PR once it's on master.

sr-gi commented 10 months ago

Most comments have been addressed, I left additional comments in the ones that have not.

Once those are addressed, this should be good to go. It will need both rebasing and squashing.

talaia-labs / rust-teos

optimizing memory usage #190