hunyadi / md2conf

Publish Markdown files to Confluence wiki
MIT License
56 stars 31 forks source link

Calls to Path.absolute() do not resolve relative paths #71

Closed Cau5tik closed 2 weeks ago

Cau5tik commented 2 weeks ago

Hey folks. This is my first PR to someone else's public project. I've done my best to be thorough in my writing, but I'm sure this is bad for some reason I'm not aware of. Please let me know if there's anything I'm missing or anything that should be better. Thanks!

Given a sample directory structure:

/parent
 |
 |_> /child
       |_> document.md
 |_> README.md

Where /parent/child/document.md contains a reference to /parent/README.md:

[See README](../README.md)

_transform_link will throw a DocumentError:

2024-11-14 15:44:37,959 - INFO - _synchronize_page [88] - Synchronizing page: /Users/sliddell/Documents/parent/child.md
Traceback (most recent call last):
  File "/Users/sliddell/.pyenv/versions/3.9.17/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/sliddell/.pyenv/versions/3.9.17/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/__main__.py", line 222, in <module>
    main()
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/__main__.py", line 204, in main
    Application(
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/application.py", line 49, in synchronize
    self.synchronize_directory(path)
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/application.py", line 79, in synchronize_directory
    self._synchronize_page(page_path, page_metadata)
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/application.py", line 89, in _synchronize_page
    document = ConfluenceDocument(page_path, self.options, page_metadata)
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/converter.py", line 1003, in __init__
    converter.visit(self.root)
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/converter.py", line 263, in visit
    self.visit(source)
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/converter.py", line 259, in visit
    target = self.transform(source)
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/converter.py", line 830, in transform
    return self._transform_link(child)
  File "/Users/sliddell/Documents/scratch/md2conf/lib/python3.9/site-packages/md2conf/converter.py", line 406, in _transform_link
    raise DocumentError(msg)
md2conf.converter.DocumentError: unable to find matching page for URL: ../README.md

Setting a breakpoint at line 399 in converter.py shows that the PosixPath of absolute_path is not resolved as per the comments at line 383

-> if link_metadata is None:
(Pdb) self.page_metadata
{PosixPath('/Users/sliddell/Documents/oncall-runbooks/parent/README.md'): ConfluencePageMetadata(domain='<not relevant>', base_path='/wiki/', page_id='1222148237', space_key='ENG', title='sliddelltest'), PosixPath('/Users/sliddell/Documents/oncall-runbooks/parent/child/document.md'): ConfluencePageMetadata(domain='<not_relevant>', base_path='/wiki/', page_id='1222148247', space_key='ENG', title='Document')}
(Pdb) absolute_path
PosixPath('/Users/sliddell/Documents/oncall-runbooks/parent/child/../README.md')

The PathLib docs for Path.absolute explicitly says that absolute() does not normalize relative paths. Looking at Path.resolve below it, this function does resolve relative paths. It also calls out that this is the only function that will eliminate .. components.

This means that if the value for absolute_path in _transform_link is a relative path, it will never match a key in the self.page_metadata dictionary. We need to resolve the relative path into an absolute path when we want to compare it to the keys of self.page_metadata.

Changing line 386 to resolve() rather than absolute() doesn't fix the issue, as we're referring to a markdown document in another directory. This will raise a different DocumentError that the relative URL is outside the base path. While this is true, this doesn't actually break anything. Resolving the relative path when comparing it to the page_metadata dictionary results in the correct link being created to the markdown document in the parent folder.

If this is better fixed at line 386 when we expect links to resolve, then it would be great to better understand why you don't want to traverse directories with relative paths. My use case involves multiple teams creating documentation in a single repository. There are enough teams and enough documents that not everything will live in a single folder, but we would still like to make references between documents to keep them DRY.

Please let me know if there's anything else I can provide. Thanks for your time and efforts!

hunyadi commented 2 weeks ago

Changes have been pushed to the branch master that set both a base_dir and a root_dir in the initializer of ConfluenceStorageFormatConverter. As before, base_dir is used for resolving relative links such as path/to/other.md in the Markdown document. In contrast, root_dir is used for checking whether the linked document is within the directory hierarchy being published. The combination of these member variables should allow you to use relative links in Markdown that "go up" (e.g. ../other.md) so long as they don't go beyond the directory you originally passed to md2conf as a command-line argument.

Feel free to let me know if these changes have addressed the issue you reported.

Last but not least, thank you for reporting the issue, and providing an in-depth description, which has helped tremendously to track down the root cause.

Cau5tik commented 2 weeks ago

Works great. Relative URLs to parent directories are working very well. Thank you for such a fast response!