Closed kedoodle closed 2 months ago
Strange. Do you see the same symptoms if you install the extension from PyPI and run the command md2conf
directly (rather than via Docker)? I have a test case that synchronizes a directory tree and it appears to work nicely.
It appears the newly introduced Matcher
class was eagerly filtering directory entries based on extension mismatch, which resulted in the processor never recursing into directories. Matcher
was extended to handle directories separately, and unit tests have been added to verify that nested directories are properly visited. Improvements have been pushed to the branch master
.
Thanks for reporting the issue!
Hi @hunyadi, thank you for the prompt fix! Yes, I was seeing the issue with running md2conf
directly, although it was still inside a container (python:3.11-slim
base) as it's in a pipeline.
Is it possible to get a container image with the fix built? Happy to help with release strategy e.g. GitHub release -> CI publishes md2conf
PyPI and container image.
@hunyadi I still have the issue. Can this be re-opened?
docker run --rm -v "$(pwd)/test-md2conf:/workdir" leventehunyadi/md2conf:latest \
-r $1 \
--domain my-confluence \
--path '/' \
--space 'MYSPACE' \
--apikey "$CONFLUENCE_API_TOKEN" \
--ignore-invalid-url \
/workdir
with test-md2conf (attached)
test-md2conf
|-- README.md
|-- folder-A
| |-- folder-AA
| | |-- README.md
| | `-- another-page-aa.md
| `-- folder-AB
| |-- README.md
| `-- another-page-ab.md
`-- folder-B
|-- README.md
`-- another-page-b.md
I get the logs :
2024-10-23 17:47:39,480 - INFO - synchronize_directory [54] - Synchronizing directory: /workdir
2024-10-23 17:47:39,480 - INFO - _index_directory [94] - Indexing directory: /workdir
2024-10-23 17:47:39,480 - INFO - page_exists [471] - Checking if page exists with title: README
2024-10-23 17:47:39,563 - INFO - create_page [438] - Creating page: README
2024-10-23 17:47:39,677 - INFO - _index_directory [94] - Indexing directory: /workdir/folder-B
2024-10-23 17:47:39,678 - INFO - page_exists [471] - Checking if page exists with title: another-page-b
2024-10-23 17:47:39,709 - INFO - create_page [438] - Creating page: another-page-b
2024-10-23 17:47:39,839 - INFO - page_exists [471] - Checking if page exists with title: README
2024-10-23 17:47:39,918 - INFO - _index_directory [94] - Indexing directory: /workdir/folder-A
2024-10-23 17:47:39,918 - INFO - _index_directory [94] - Indexing directory: /workdir/folder-A/folder-AA
2024-10-23 17:47:39,918 - INFO - page_exists [471] - Checking if page exists with title: another-page-aa
2024-10-23 17:47:39,950 - INFO - create_page [438] - Creating page: another-page-aa
2024-10-23 17:47:40,073 - INFO - page_exists [471] - Checking if page exists with title: README
2024-10-23 17:47:40,148 - INFO - _index_directory [94] - Indexing directory: /workdir/folder-A/folder-AB
2024-10-23 17:47:40,148 - INFO - page_exists [471] - Checking if page exists with title: README
2024-10-23 17:47:40,208 - INFO - page_exists [471] - Checking if page exists with title: another-page-ab
2024-10-23 17:47:40,243 - INFO - create_page [438] - Creating page: another-page-ab
2024-10-23 17:47:40,375 - INFO - synchronize_directory [64] - indexed 7 page(s)
2024-10-23 17:47:40,375 - INFO - _synchronize_page [77] - Synchronizing page: /workdir/README.md
2024-10-23 17:47:40,458 - INFO - update_page [418] - Updating page: 2009315992
2024-10-23 17:47:40,585 - INFO - _synchronize_page [77] - Synchronizing page: /workdir/folder-B/another-page-b.md
2024-10-23 17:47:40,636 - INFO - update_page [418] - Updating page: 2009315993
2024-10-23 17:47:40,764 - INFO - _synchronize_page [77] - Synchronizing page: /workdir/folder-B/README.md
2024-10-23 17:47:40,820 - INFO - update_page [418] - Updating page: 2009315992
2024-10-23 17:47:40,941 - INFO - _synchronize_page [77] - Synchronizing page: /workdir/folder-A/folder-AA/another-page-aa.md
2024-10-23 17:47:40,991 - INFO - update_page [418] - Updating page: 2009315994
2024-10-23 17:47:41,122 - INFO - _synchronize_page [77] - Synchronizing page: /workdir/folder-A/folder-AA/README.md
2024-10-23 17:47:41,175 - INFO - update_page [418] - Updating page: 2009315992
2024-10-23 17:47:41,296 - INFO - _synchronize_page [77] - Synchronizing page: /workdir/folder-A/folder-AB/README.md
2024-10-23 17:47:41,342 - INFO - update_page [418] - Updating page: 2009315992
2024-10-23 17:47:41,458 - INFO - _synchronize_page [77] - Synchronizing page: /workdir/folder-A/folder-AB/another-page-ab.md
2024-10-23 17:47:41,511 - INFO - update_page [418] - Updating page: 2009315995
But no folder structure:
Also, when I run from md2conf which is installed locally (pip install / no container),
python3 -m md2conf -r $1 \
--domain my-confluence \
--path '/' \
--space 'MYSPACE' \
--apikey "$CONFLUENCE_API_TOKEN" \
--ignore-invalid-url \
./test-md2conf
I get
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/user/git/docs-md2conf/venv/lib/python3.11/site-packages/md2conf/__main__.py", line 211, in <module>
main()
File "/home/user/git/docs-md2conf/venv/lib/python3.11/site-packages/md2conf/__main__.py", line 196, in main
).synchronize(args.mdpath)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/git/docs-md2conf/venv/lib/python3.11/site-packages/md2conf/application.py", line 40, in synchronize
self.synchronize_directory(path)
File "/home/user/git/docs-md2conf/venv/lib/python3.11/site-packages/md2conf/application.py", line 63, in synchronize_directory
self._index_directory(local_dir, root_id, page_metadata)
File "/home/user/git/docs-md2conf/venv/lib/python3.11/site-packages/md2conf/application.py", line 125, in _index_directory
self._index_directory(Path(local_dir) / directory, parent_id, page_metadata)
File "/home/user/git/docs-md2conf/venv/lib/python3.11/site-packages/md2conf/application.py", line 100, in _index_directory
for entry in os.scandir(local_dir):
^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'test-md2conf/test-md2conf/folder-B'
It looks like the recursive call is incorrect.
Thanks!
Indeed. When passing the argument to the recursive call, the current directory (as per the directory traversal) was prepended to the nested directory path to scan. Unfortunately, the nested directory path had already been an absolute path. Python's pathlib.Path
allows you to combine two absolute paths, in which case it is a no-op, the second path prevails. Unit tests use absolute paths, which is why this issue went unnoticed. However, when you pass a relative directory as the directory to publish (e.g. from the command line), the concatenation is not a no-op but produces the wrong path as in your error report:
FileNotFoundError: [Errno 2] No such file or directory: 'test-md2conf/test-md2conf/folder-B'
This issue has been fixed in the latest commit to the branch master
. Thanks for reporting the issue!
Thanks @hunyadi! I checked with the latest on master, and the problem with double path is gone. But I still don't have the folder structure with the example I attached above. Any idea?
Did you delete the pages from Confluence and remove the Confluence page IDs from the Markdown files? Once a Confluence page has been created and a page ID has been assigned, md2conf doesn't adjust parent-child relationships. (This allows us to update pages that are e.g. in the same source directory but scattered across spaces or parent pages in Confluence. This is a feature we are extensively relying on.) md2conf has a function called api.create_page
that passes the JSON field ancestors
to Confluence, which is how the parent is set. Other functions don't set ancestors
. The log above appears to have
update_page [418] - Updating page: 2009315992
which indicates a call to api.update_page
. A future version of md2conf could set ancestors
for existing pages but this is not implemented yet.
Did you delete the pages from Confluence and remove the Confluence page IDs from the Markdown files?
Yes. I also deleted the root page and recreate a new one every time.
I am not sure I understand if there is a limitation, or something I should change to get this to work. 😬
Thanks for the prompt feedback! I have pushed more extensive changes as part of the commit Fix an issue with index page not assuming parent role. My own manual tests confirm that it's now working as expected (but had the same issue as what you reported prior to the changes). In the future, I will make these automated tests to avoid regressions.
Hi there, it looks like a recent change may have introduced a regression with recursive syncing of a directory. Currently only finding files in the provided
mdpath
directory (not sub directories).Set up:
Should produce something like:
Running
md2conf
:Expected to index 5 pages, not 3.