torproject / stem

Python controller library for Tor
https://stem.torproject.org/
GNU Lesser General Public License v3.0
279 stars 76 forks source link

stem.directory.Authority.from_remote() will break when GitWeb shuts down #134

Closed PascalinDe closed 1 year ago

PascalinDe commented 1 year ago

Hello,

first off, thank you for your library! I apologize in advance if this is just us using it wrongly, I'm new to the subject, and would definitely appreciate any pointers in the right direction.

We are using stem to retrieve the directory files for https://github.com/c4dt/lightarti-rest (for details see https://github.com/c4dt/lightarti-directory). However we have currently problems that seem to be related to an outdated v3ident value for moria1 that seems to get retrieved by stem (see here https://github.com/torproject/tor/commit/72b04a5aa42dd2729cf9fe9452e559c29466b250)

Here's a small script that mimicks what we're doing:

python3
Python 3.11.4 (main, Jun  7 2023, 10:13:09) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import stem
>>> stem.__version__
'1.8.2'
>>> import stem.directory
>>> for name, authority in `stem.directory.Authority.from_cache()`.items():
...     print((name, authority.v3ident))
... 
('moria1', 'D586D18309DED4CD6D57C18FDB97EFA96D330566') <-- old v3ident value
[...]
>>> import stem.descriptor
>>> import stem.descriptor.remote
>>> downloader = stem.descriptor.remote.DescriptorDownloader()
>>> temp = downloader.get_consensus(document_handler=stem.descriptor.DocumentHandler.DOCUMENT, microdescriptor=True).run()[0]
>>> [auth.v3ident for auth in temp.directory_authorities]
['0232AF901C31A04EE9848595AF9BB7620D4C5B2E', '14C131DFC5C6F93646BE72FA1401C02A8DF2E8B4', '23D15D965BC35114467363C165C4F724B64B4F66', '27102BC123E7AF1D4741AE047E160C91ADC76B21', '49015F787433103580E3B66A1707A00E60F2D15B', 'E8A9C45EDE6D711294FADF8E7951F4DE6CA56B58', 'ED03BB616EB2F60BEC80151114BB25CEF515B226', 'F533C81CEF0BC0267857C99B2F471ADF249FA232'] <-- last element is new v3ident value

We then compare the v3idents retrieved by the first part with those retrieved in the second part (specifically we look for moria1) and come up with an empty list, as the first one has the old value, and the second one the new value (the last element).

Is this the expected behaviour for stem.directory.Authority.from_cache()?

Thank you in advance!

atagar commented 1 year ago

first off, thank you for your library! I apologize in advance if this is just us using it wrongly, I'm new to the subject, and would definitely appreciate any pointers in the right direction.

Hi Carine. Thanks, I'm delighted that you find this library helpful!

Sadly Stem has been unmaintained for a couple years. This is expect behavior in that: Authority.from_cache() provides directory information as of Stem's release whereas Authority.from_remote() downloads whatever the current directory information is.

Please feel free to reopen if you have any further questions.

atagar commented 1 year ago

Oh drats, I forgot that Tor deprecated their GitWeb instance. The GitWeb page that Stem uses is still alive so the from_remote() method should work for now but it'll cease if they shut it down. :(

atagar commented 1 year ago

Reopened with a new title since we should definitely have a ticket for the from_remote() issue. I'll tell Georg about it because if/when TPO shuts down GitWeb some of their infrastructure (like DocTor) may break.

PascalinDe commented 1 year ago

Hi Damian,

thanks for your quick reply! Since we're using the library we'd be happy to give you a hand to resolve this - as I said, I'm new to the topic, but if you let me know what you need I can look into it

also Cc @ineiti since he's working on lightarti-rest/lightarti-directory as well

atagar commented 1 year ago

Thanks Carine. I don't personally have a stake in this but here's a spot to start:

If GitLab provides raw file content then fixing this might be as simple as to change a couple urls in directory.py.

PascalinDe commented 1 year ago

Hi Damian, great, I'll have a look! Thank you for your help.

PascalinDe commented 1 year ago

Hi Damian,

just to keep you up-to-date, currently Authority.from_remote() is not working because of a parsing issue

Python 3.11.4 (main, Jun  7 2023, 10:13:09) [GCC 12.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import stem.directory

In [2]: print(stem.directory.Authority.from_remote())
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/src/stem/stem/directory.py:269, in Authority.from_remote(timeout)
    267 results = {}
--> 269 for matches in _directory_entries(lines, Authority._pop_section, (AUTHORITY_NAME, AUTHORITY_V3IDENT, AUTHORITY_IPV6, AUTHORITY_ADDR), required = (AUTHORITY_NAME, AUTHORITY_ADDR)):
    270   nickname, or_port = matches.get(AUTHORITY_NAME)  # type: ignore

File ~/src/stem/stem/directory.py:109, in _directory_entries(lines, pop_section_func, regexes, required)
    108 while next_section:
--> 109   yield _match_with(next_section, regexes, required)
    110   next_section = pop_section_func(lines)

File ~/src/stem/stem/directory.py:100, in _match_with(lines, regexes, required)
     99     if required_matcher not in matches:
--> 100       raise ValueError('Failed to parse mandatory data from:\n\n%s' % '\n'.join(lines))
    102 return matches

ValueError: Failed to parse mandatory data from:

"moria1 orport=9201 "
  "v3ident=F533C81CEF0BC0267857C99B2F471ADF249FA232 "
  "128.31.0.39:9231 1A25C6358DB91342AA51720A5038B72742732498",

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
Cell In [2], line 1
----> 1 print(stem.directory.Authority.from_remote())

File ~/src/stem/stem/directory.py:283, in Authority.from_remote(timeout)
    273     results[nickname] = Authority(
    274       address = address,
    275       or_port = or_port,
   (...)
    280       v3ident = matches.get(AUTHORITY_V3IDENT),  # type: ignore
    281     )
    282 except ValueError as exc:
--> 283   raise OSError(str(exc))
    285 return results

OSError: Failed to parse mandatory data from:

"moria1 orport=9201 "
  "v3ident=F533C81CEF0BC0267857C99B2F471ADF249FA232 "
  "128.31.0.39:9231 1A25C6358DB91342AA51720A5038B72742732498",

ValueError                                Traceback (most recent call last)
File ~/src/stem/stem/directory.py:269, in Authority.from_remote(timeout)
    267 results = {}
--> 269 for matches in _directory_entries(lines, Authority._pop_section, (AUTHORITY_NAME, AUTHORITY_V3IDENT, AUTHORITY_IPV6, AUTHORITY_ADDR), required = (AUTHORITY_NAME, AUTHORITY_ADDR)):
    270   nickname, or_port = matches.get(AUTHORITY_NAME)  # type: ignore

File ~/src/stem/stem/directory.py:109, in _directory_entries(lines, pop_section_func, regexes, required)
    108 while next_section:
--> 109   yield _match_with(next_section, regexes, required)
    110   next_section = pop_section_func(lines)

File ~/src/stem/stem/directory.py:100, in _match_with(lines, regexes, required)
     99     if required_matcher not in matches:
--> 100       raise ValueError('Failed to parse mandatory data from:\n\n%s' % '\n'.join(lines))
    102 return matches

ValueError: Failed to parse mandatory data from:

"moria1 orport=9201 "
  "v3ident=F533C81CEF0BC0267857C99B2F471ADF249FA232 "
  "128.31.0.39:9231 1A25C6358DB91342AA51720A5038B72742732498",

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
Cell In [2], line 1
----> 1 print(stem.directory.Authority.from_remote())

File ~/src/stem/stem/directory.py:283, in Authority.from_remote(timeout)
    273     results[nickname] = Authority(
    274       address = address,
    275       or_port = or_port,
   (...)
    280       v3ident = matches.get(AUTHORITY_V3IDENT),  # type: ignore
    281     )
    282 except ValueError as exc:
--> 283   raise OSError(str(exc))
    285 return results

OSError: Failed to parse mandatory data from:

"moria1 orport=9201 "
  "v3ident=F533C81CEF0BC0267857C99B2F471ADF249FA232 "
  "128.31.0.39:9231 1A25C6358DB91342AA51720A5038B72742732498",

the problem is this regex:

AUTHORITY_ADDR = re.compile('"([\\d\\.]+):(\\d+) ([\\dA-F ]{49})",')

since the new fingerprint contains no more whitespaces: https://github.com/torproject/tor/commit/72b04a5aa42dd2729cf9fe9452e559c29466b250#diff-d5122e908cf008bec45a8e34b1aef9f0169dc14bbb517fd05567f574ffa2a9d3 the length is 40 chars instead of 49

should stem accomodate fingerprints without whitespaces or shall I open a PR with Tor?

atagar commented 1 year ago

Great work, Carine. This should be accommodated for on Stem's end. Tor's auth_dirs.inc is an internal file. Reading it from Stem is a hack.

PascalinDe commented 1 year ago

opened a pull request for this issue

while working on it, I saw that Fallback.from_remote() is not working either - I'll open a separate issue for that