basically there are two main things im doing here.
the first is that we dont write tiles that have no segments in them. before we wrote the tile no matter what. this was happening because we'd have a tile with only a few segments in it and so because of the privacy threshold we would end up with none left and write an empty tile.
the second main issue is that when using the kafka key value store there is a 1 megabyte limit imposed on any one value. this is fine for most of the places we use the key value store. for example we use it to store a small window of points of a single vehicles trace, we use it to store segment pairs. those are very small amounts of data. but we also use it to store all the segments in a tile. this is a problem. we can get a lot of observations for a given area over a given period of time. and when that happens and we try to write the data back to the store it throws an exception. so we now catch that and write the tile to the datastore (s3,http,disk). the other option would be that we configure the brokers or whatever in kafka to allow for larger values but the trick here is that you'd have to know what those will be! what this amounts to is that we may end up throwing out some values because when we run out of space to store the tiles data some of the observations might not have had enough privacy yet. this can be fixed an ill do it in another pr or another commit (we'll see).
other than that i did some work on the logging to be only the important stuff at INFO level and set that as the default.
basically there are two main things im doing here.
the first is that we dont write tiles that have no segments in them. before we wrote the tile no matter what. this was happening because we'd have a tile with only a few segments in it and so because of the privacy threshold we would end up with none left and write an empty tile.
the second main issue is that when using the kafka key value store there is a 1 megabyte limit imposed on any one value. this is fine for most of the places we use the key value store. for example we use it to store a small window of points of a single vehicles trace, we use it to store segment pairs. those are very small amounts of data. but we also use it to store all the segments in a tile. this is a problem. we can get a lot of observations for a given area over a given period of time. and when that happens and we try to write the data back to the store it throws an exception. so we now catch that and write the tile to the datastore (s3,http,disk). the other option would be that we configure the brokers or whatever in kafka to allow for larger values but the trick here is that you'd have to know what those will be! what this amounts to is that we may end up throwing out some values because when we run out of space to store the tiles data some of the observations might not have had enough privacy yet. this can be fixed an ill do it in another pr or another commit (we'll see).
other than that i did some work on the logging to be only the important stuff at INFO level and set that as the default.