google / rappor

RAPPOR: Privacy-Preserving Reporting Algorithms
Apache License 2.0
862 stars 163 forks source link

Add a --use-map-cache flag to bin/decode-dist and bin/decode-assoc. #65

Closed andychu closed 8 years ago

andychu commented 8 years ago

The default is false, so you now have to explicitly enable it. This is better for tests, and also allows the file system where map files are stored to be read-only.

tkaitchuck commented 8 years ago

Why complicate things with a new option and parameter, and more state the user has to be concerned with? We delete all the other data in the directory when we create the map, why not the cache? That's the inconsistent state. That way, if you don't re-generate the map, then you get to re-use the cache, but if you do then there should be no cache.

andychu commented 8 years ago

I think this is less complicated overall, because the default is no cache. By default there is no complication... only if you want extra performance will you be concerned with extra state and extra flag (and it can be a significant win)

tkaitchuck commented 8 years ago

Right, but if you simply always use the cache if it is there, then you always get the performance benefit without ever having to type the flag. This can be made safe by deleting the cache when you write a new map.

andychu commented 8 years ago

Well, the other issue is that the "cache" is incorrect (and has always been)... it only tests for the existence of a file with the same name. If the .csv file changes, you read from a stale cache.

I'd rather have incorrect behavior be opt-in, not on by default. This is not just for the tests.

Another option is that we can remove the flag and just fix the bug where the it causes the analysis to fail if the file system is read only, because it can't write the .rda.

tkaitchuck commented 8 years ago

Yes. Exactly. We should not allow the 'sale cache' scenario. If we change the CSV the cache should be deleted.

On Thu, Jan 14, 2016 at 11:27 AM, andychu notifications@github.com wrote:

Well, the other issue is that the "cache" is incorrect (and has always been)... it only tests for the existence of a file with the same name. If the .csv file changes, you read from a stale cache.

I'd rather have incorrect behavior be opt-in, not on by default. This is not just for the tests.

Another option is that we can remove the flag and just fix the bug where the it causes the analysis to fail if the file system is read only, because it can't write the .rda.

— Reply to this email directly or view it on GitHub https://github.com/google/rappor/pull/65#issuecomment-171751867.

andychu commented 8 years ago

OK, I sent out a pull request that just fixes the immediate bug of the .rda write failing... I logged this as an issue for later: https://github.com/google/rappor/issues/68