torproject / stem

Python controller library for Tor
https://stem.torproject.org/
GNU Lesser General Public License v3.0
279 stars 76 forks source link

feat: update stem.directory.Fallback.from_remote() functionality to current format/URL #138

Closed PascalinDe closed 1 year ago

PascalinDe commented 1 year ago

formerly https://github.com/torproject/stem/pull/137

I rebased onto master so that it is independent from https://github.com/torproject/stem/pull/135

atagar commented 1 year ago

Hi Carine. This includes an error with the dirport (dir_port = int(dir_port or or_port)) that I attempted to fix via a patch on #135.

PascalinDe commented 1 year ago

Hi Damian,

I just rebased as-is so that https://github.com/torproject/stem/pull/135 can be treated independently (I'm focusing on that one first). I will look into this PR around mid next week, thank your for your help!

PascalinDe commented 1 year ago

hi, sorry for the long delay, I didn't find time to work on it sooner

so, I integrated your changes and added one regarding dir_port (https://github.com/torproject/stem/pull/138/commits/25a2cbe083b349c3e6a6d7d1f4a9315fab75c80c#r1285996434)

I have two unit tests that have different results than on the current master branch:

the problem seems to be that they are validating the code on a mock of the server response that does no longer correspond to the current format:

I adapted the code to the current response, hence the tests failing

so I'm wondering whether I should update the test data or whether the code should continue to support the old format?

atagar commented 1 year ago

so I'm wondering whether I should update the test data or whether the code should continue to support the old format?

Ideally there should be tests with both the new and old fallback_dirs.inc. Tor specified the format of the file here.

If I was in the driver's seat on this the next questions I'd be asking are...

  1. When did fallback_dirs.inc make the change that Stem fails to parse?
  2. Did the change coincide with a fallback_dirs.inc version bump? What else changed at the time?
  3. Did the change get recorded in dir-list-spec.txt?

All that said, if you'd care to simply swap the tests to the new format and call it a day I don't mind. As previously mentioned Stem is unmaintained so it's silly to let the perfect be the enemy of the good on this. :)

PascalinDe commented 1 year ago

so, I followed your pointers, and what I found :

The above script does not seem to follow too closely the specs (e.g. the two comment lines at the beginning) but is rather meant to accomodate arti 's API - there seems to be some awareness of this (cf. https://gitlab.torproject.org/tpo/core/tor/-/commit/131e2d99a45bd86b8379664cccb59bfc66f80c96) but I cannot find any other commits that address this.

So my approach would be to adapt my PR to follow the output of the above script, and update the test data accordingly, and add a comment in the file that summarizes the above - what do you think?

PascalinDe commented 1 year ago

slightly off-topic, I really want to stress that we appreciate you taking the time reviewing the PRs despite not maintaining the project!

atagar commented 1 year ago

So my approach would be to adapt my PR to follow the output of the above script, and update the test data accordingly, and add a comment in the file that summarizes the above - what do you think?

Sounds good to me.

slightly off-topic, I really want to stress that we appreciate you taking the time reviewing the PRs despite not maintaining the project!

My pleasure! You are uncommonly delightful to work with. Thank you for making the open source world better. :)

PascalinDe commented 1 year ago

I updated the PR as discussed (updated the test example to the current format and the code to the format as produced by the script of Tor's CI), and also updated the URL to the GitLab URL

I get the same output for the unit and integration tests on this branch and the master branch, and the manual tests I did look good as well

My pleasure! You are uncommonly delightful to work with. Thank you for making the open source world better. :)

thank you, that's very kind of you to say - it's a really nice project! while that wouldn't be in the scope of my work project, I have some free time on my hands and I was thinking of working a bit more on stem, but I wouldn't want to impose on you (it'd be probably not more than one or so PR per month), would that generally be OK for you?

atagar commented 1 year ago
+      if attr['dir_port'] != 'None':
+          dir_port = int(attr['dir_port'])
+      else:
+          dir_port = None

There's a couple issues with this...

  1. You're comparing with the string 'None' rather than python's None value.
  2. This tidbit is pedantic but python's conventions prefer the use of 'is' for None checks (ie. if attr['dir_port'] is not None:).

My commit did the following instead...

self.dir_port = int(dir_port) if dir_port else None

Is there a reason not to do this?


# example of current header

Please change 'current' to the present date. Sadly that comment won't be accurate in a few years. ;)


-    for line in Fallback._pop_section(lines):
+    for _ in range(4):
+      line = lines.pop(0)

Why do we want the first four lines rather than pop_section? Won't this raise an unexpected IndexError if 'lines' doesn't have four values?

PascalinDe commented 1 year ago
+      if attr['dir_port'] != 'None':
+          dir_port = int(attr['dir_port'])
+      else:
+          dir_port = None

There's a couple issues with this...

1. You're comparing with the string 'None' rather than python's None value.

2. This tidbit is pedantic but python's conventions prefer the use of 'is' for None checks (ie. `if attr['dir_port'] is not None:`).

My commit did the following instead...

self.dir_port = int(dir_port) if dir_port else None

Is there a reason not to do this?

The line you're referring to is in the from_cache method that loads the configuration from the cached fallbacks. If a cached fallback has no dir_port, this value is cached as 'None' (the string). You can see this e.g. in the test_persistence test:

test/unit/directory/fallback.py                                                                                                                                                                                    
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB set_trace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /home/cdengler/src/stem.c4dt/stem/directory.py(364)from_cache()                                                                                                                                                  
-> conf = stem.util.conf.Config()                                                                                                                                                                                  
(Pdb) n                                                                                                                                                                                                            
> /home/cdengler/src/stem.c4dt/stem/directory.py(365)from_cache()                                                                                                                                                  
-> conf.load(path)                                                                                                                                                                                                 
(Pdb) n                                                                                                                                                                                                            
> /home/cdengler/src/stem.c4dt/stem/directory.py(366)from_cache()                                                                                                                                                  
-> headers = collections.OrderedDict([(k.split('.', 1)[1], conf.get(k)) for k in conf.keys() if k.startswith('header.')])                                                                                          
(Pdb) conf['6D6EC2A2E2ED8BFF2D4834F8D669D82FC2A9FA8D.dir_port']                                                                                                                                                    
['None']

I think I misunderstood you comment

port on which directory information is available, or None if it doesn't have one

as "this should also be the content in the fallback file" so I didn't change anything there (ln. 530) - shall I change the Fallback._write method to adapt to this? e.g. not writing this key at all to the file? then the line you proposed would work

# example of current header

Please change 'current' to the present date. Sadly that comment won't be accurate in a few years. ;)

You're absolutely right, I'll change that!

-    for line in Fallback._pop_section(lines):
+    for _ in range(4):
+      line = lines.pop(0)

Why do we want the first four lines rather than pop_section? Won't this raise an unexpected IndexError if 'lines' doesn't have four values?

_pop_section would include the first entry in the header, since the header is not separated from the entries by FALLBACK_DIV anymore. As for the IndexError, as per the comment https://gitlab.torproject.org/tpo/core/fallback-scripts/-/blob/main/src/main.rs#L14 the header is currently hard-coded, so the number of lines will probably not change. If I understand the comment correctly, it is bound to be removed? But you're right my solution is neither here nor there, it should either be hard-coded as well or more flexible to a different length (or no header at all), I will change it to the latter.

PascalinDe commented 1 year ago

I also forgot: I noticed when I was working on this that there are no more nodes where the nickname value is empty, instead there seems to be now the convention that they have the nickname "Unnamed" - I guessed that since they changed this, "Unnamed" would be an expected value for nickname and the key would be expected to be present, so I didn't change the code to check for this value and not include nickname as a key

PascalinDe commented 1 year ago

pushed the changes as I proposed above, no regressions on both the unit and the integration tests

I put them in their own commit so they are easier to review

atagar commented 1 year ago

Thanks Carine, this looks good to me! It's now pushed.

The line you're referring to is in the from_cache method that loads the configuration from the cached fallbacks... shall I change the Fallback._write method to adapt to this? e.g. not writing this key at all to the file? then the line you proposed would work

Oh! I see. My vote is to leave Fallback._write as-is. You're correct that it (and cache_fallback_directories.py) will need to be changed if/when new fallbacks are cached.

I don't mind leaving that as tech debt for when the script is next used. The cache was last synced in 2020 so there might be other issue too.

there are no more nodes where the nickname value is empty, instead there seems to be now the convention that they have the nickname "Unnamed"

Sounds good.

PascalinDe commented 1 year ago

Oh! I see. My vote is to leave Fallback._write as-is. You're correct that it (and cache_fallback_directories.py) will need to be changed if/when new fallbacks are cached.

Ah, I didn't expect you to merge the branch that quickly - I changed it in the commit https://github.com/torproject/stem/pull/138/commits/6700dee526d94c291ed864b384b8e20c4270a244 to illustrate what I meant, so it's now merged, so sorry for the mix-up that really wasn't clear from my side...