nephila / giturlparse

Parse & rewrite git urls (supports GitHub, Bitbucket, Assembla ...)
https://pypi.python.org/pypi/giturlparse
Apache License 2.0
31 stars 23 forks source link

The parser validates non valid urls #97

Open jubnl opened 8 months ago

jubnl commented 8 months ago

Description

The parser should be stronger, there's some urls that are valid for the lib should not be valid. See code/output below

Steps to reproduce

from pprint import pprint

import giturlparse

if __name__ == "__main__":

    urls = [
        "https://github(com../testing2/jubnl/test",
        "https://github$com/testing2/ jubnl/test ",
        "https://git/test",
        "https://git...com/jubnl",
    ]

    for url in urls:
        parsed = giturlparse.parse(url)
        print(f"Initial url: '{url}'")
        print(f"Is url valid: {parsed.valid}")
        if parsed.valid:
            print(f"Parsed urls:")
            pprint(parsed.urls)
C:\Users\user\PycharmProjects\multiproc\.venv\Scripts\python.exe C:\Users\user\PycharmProjects\multiproc\main.py 
Initial url: 'https://github(com../testing2/jubnl/test'
Is url valid: True
Parsed urls:
{'git': 'git://github(com../testing2/jubnl/test.git',
 'https': 'https://github(com../testing2/jubnl/test.git',
 'ssh': 'git@github(com..:testing2/jubnl/test.git'}
Initial url: 'https://github$com/testing2/ jubnl/test '
Is url valid: True
Parsed urls:
{'git': 'git://github$com/testing2/ jubnl/test .git',
 'https': 'https://github$com/testing2/ jubnl/test .git',
 'ssh': 'git@github$com:testing2/ jubnl/test .git'}
Initial url: 'https://git/test'
Is url valid: True
Parsed urls:
Traceback (most recent call last):
  File "C:\Users\user\PycharmProjects\multiproc\main.py", line 107, in <module>
    pprint(parsed.urls)
           ^^^^^^^^^^^
  File "C:\Users\user\PycharmProjects\multiproc\.venv\Lib\site-packages\giturlparse\result.py", line 102, in urls
    return {protocol: self.format(protocol) for protocol in self._platform_obj.PROTOCOLS}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\PycharmProjects\multiproc\.venv\Lib\site-packages\giturlparse\result.py", line 102, in <dictcomp>
    return {protocol: self.format(protocol) for protocol in self._platform_obj.PROTOCOLS}
                      ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\PycharmProjects\multiproc\.venv\Lib\site-packages\giturlparse\result.py", line 73, in format
    return self._platform_obj.FORMATS[protocol] % items
           ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
KeyError: 'http'

Process finished with exit code 1

Versions

Python 3.11.4 giturlparse 0.12.0

Windows 11

Expected behaviour

The parser should not validate those kind of url

Actual behaviour

The parser validated the urls