Persistent consensus / descriptor storage

twilsonb commented 10 years ago

Consensus.getDirectoryStream() could save the consensus in a local "tor[-research-framework] data" directory, and then check for existing files before looking them up over the network.

This would reduce the load on the authorities, which are consulted at least once for each (often short) run.

Issues:

Filesystem Access / Java Sandbox
Selection of [default] shared directory: access/security, persistence, storage limits
/tmp won't do in many cases:
Windows users would need platform agnostic path
OS X sandbox provides per-process TMPDIRs (not shared)
Handling of stale documents

Gareth, did you have a preferred strategy for this?

owenson commented 10 years ago

If we're going to cache it, I'm keen for the cached file to be easily spotted by a user as I suspect a typical user may be loosely familiar with tor but may not understand all the intricacies, so I don't want to hide files away in for example Windows\Temp.

I think there's two approaches we can take with this, one is to just dump it in the current directory and use it if it exists, the other is to create a new Config class where the user can configure the path. I think a happy middle ground might be to default to the current directory but leave it configurable by the user (but output some useful messages).

Let me have a hack of it over the next hour and I'll push a skeleton solution.

twilsonb commented 10 years ago

I had thought of the current directory - that makes sense. I hope IntelliJ uses the same CWD for all of the example classes.

We should probably get a new consensus every few hours - I'll have to look up how often the Tor spec recommends, and how often tor (C) actually gets it.

owenson commented 10 years ago

OK - I've added the caching code although it probably needs a bit more testing but seems to work.

Seems intellij is using project root as current directory which seems sensible. I've added some printlns so it's obvious it's caching it.

The consensus file contains valid and fresh-until, not sure what the difference is but I've set the code to use valid-until as that was the later of the two.

twilsonb commented 10 years ago

Consensuses have overlapping validity periods: they are (expected to be) the latest consensus until fresh-until, and are valid until valid-until.

See "1.4. Voting timeline" in https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt

"Every consensus document has a "valid-after" (VA) time, a "fresh-until" (FU) time and a "valid-until" (VU) time. VA MUST precede FU, which MUST in turn precede VU. Times are chosen so that every consensus will be "fresh" until the next consensus becomes valid, and "valid" for a while after. At least 3 consensuses should be valid at any given time."

I don't see any need to fetch a new consensus until after VU, or perhaps shortly before. (Consensus validity or freshness really isn't necessary to any of the existing examples - stale consensus info will probably affect hidden services first.)

On 30 Jul 2014, at 20:11 , Dr Gareth Owen notifications@github.com wrote:

OK - I've added the caching code although it probably needs a bit more testing but seems to work.

Seems intellij is using project root as current directory which seems sensible. I've added some printlns so it's obvious it's caching it.

The consensus file contains valid and fresh-until, not sure what the difference is but I've set the code to use valid-until as that was the later of the two.

— Reply to this email directly or view it on GitHub.

owenson commented 10 years ago

OK thanks for looking that up. I agree. I'll have a think about whether we should do refetcing automatically or leave it to the user.

twilsonb commented 10 years ago

If we refetch automatically by default, but allow the user to specify a parameter that turns refetching off, the default will be similar to the current behaviour (which is to fetch on each program launch). And the user can specify complete control of fetches if they wish.

On 30 Jul 2014, at 22:34, Dr Gareth Owen notifications@github.com wrote:

OK thanks for looking that up. I agree. I'll have a think about whether we should do refetcing automatically or leave it to the user.

— Reply to this email directly or view it on GitHub.

twilsonb commented 10 years ago

The consensus refresh appears to be working well - almost unnoticeable, except for the fact that it speeds everything up dramatically.

One minor consideration: We refresh our copy of the consensus right after it expires. To avoid load on the servers, clients are mean to refresh between 3/4 and 7/8 through the consensus period. (I can't remember exactly which period - it's in the directory spec.) I'm not sure this is even an issue for research-level use, given that we're not running constantly.

But if we were to run persistent routers, we should probably revisit this.

owenson commented 10 years ago

Sorry Tim, can you point me to the refetch code, I can't see it?

twilsonb commented 10 years ago

Consensus.fetchConsensus() will fetch and cache a consensus, then re-use the local copy until it expires. Then it will fetch another consensus. That's what I meant by "refetch". (And it's working really well!)

My most recent comment was about line 196: if (valid.after(new Date())) { // saved consensus still valid

If we were running a router, we:

don't want to ever have an expired consensus
might not want to leave cells waiting while we download and parse the newest consensus

I've just checked the Tor Directory Spec at https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt It says that clients should download a new consensus between the fresh-until of the new consensus (NFU), and the valid-until of the current consensus (CVU). CVU is read from the current consensus NFU is calculated from the current consensus, assuming it's CVA + 2 * (CFU - CVA) where CVA is the current consensus' valid-after, and CFU is the current fresh-until.

Section 5.1 gives a randomised pre-fetching algorithm:

"To avoid swarming the caches whenever a consensus expires, the clients download new consensuses at a randomly chosen time after the caches are expected to have a fresh consensus, but before their consensus will expire. (This time is chosen uniformly at random from the interval between the time 3/4 into the first interval after the consensus is no longer fresh, and 7/8 of the time remaining after that before the consensus is invalid.)

[For example, if a cache has a consensus that became valid at 1:00, and is fresh until 2:00, and expires at 4:00, that cache will fetch a new consensus at a random time between 2:45 and 3:50, since 3/4 of the one-hour interval is 45 minutes, and 7/8 of the remaining 75 minutes is 65 minutes.]"

But I'm really not sure if we need to be this well-behaved - then again, it's a fairly trivial modification.

On 3 Aug 2014, at 18:03 , Dr Gareth Owen notifications@github.com wrote:

Sorry Tim, can you point me to the refetch code, I can't see it?

— Reply to this email directly or view it on GitHub.

owenson commented 10 years ago

ah OK - glad the refetch is working ok. I think it's going to be a while before I get the server side router working to be honest - it's a minefield. Perhaps wait until then, any maybe we might integrate auto-refetch with the server code. Trying to keep the library as a 'do as I'm told' rather than helping too much.

twilsonb commented 10 years ago

Yes, until we have something that's running all the time (and run by multiple users), "on-demand" is sufficiently arbitrary, occasional, and low-level. That's more than enough for the moment, but let's keep these notes around.

And I'm not surprised about the router, that's why I stepped away from it - it seemed to be beyond me.

On 3 Aug 2014, at 23:00 , Dr Gareth Owen notifications@github.com wrote:

ah OK - glad the refetch is working ok. I think it's going to be a while before I get the server side router working to be honest - it's a minefield. Perhaps wait until then, any maybe we might integrate auto-refetch with the server code. Trying to keep the library as a 'do as I'm told' rather than helping too much.

— Reply to this email directly or view it on GitHub.

owenson / tor-research-framework

Persistent consensus / descriptor storage #8