zeldigas / text2confl

Publisher of documents to confluence
Apache License 2.0
12 stars 2 forks source link

Dealing with page ID stability: being robust when renaming page titles #144

Open feliksik opened 9 months ago

feliksik commented 9 months ago

As the sourcetext files do not have a confluence ID, the identity is matched based on the page title. A page title is unique in a confluence space.

But with the title as ID, a page is rename it leads to the page being deleted, and a new page is created. As a consequence, incoming links break 🙁 (i.e. links from other confluence pages/spaces not managed by the same text2confl project). It would be much better to update the existing page instead.

We could use the filename (and optionally/additionally a self-made-up-identifier metadata field that can be even more stable, but has to be managed manually in the text file), and put this as metadata in the confluence page. When uploading this, we would read the metadata and match with the file, thus having a bridge between the confluence page id and the file on the asciidoc/md contents. This would solve the delete-recreate issue.

I suppose there are some details to work out, and alternative approaches to consider. I think this would be a very valuable feature for usage in a larger context, where the stability of incoming links is rather important. I'm happy to collaborate on making this work.

(Note: Initially I also mentioned the page stealing issue here, mentioned in #142, but I now think it deserves a separate solution).

feliksik commented 9 months ago

I have also thought about having a metadata confluence-page-id field that can be managed in the text file, instead of a self-made-up-identifier that needs to be added in the text file, and administered as metadata in Confluence.

Obviously this would only be known after creation in Confluence, so this needs to be added to the text later; either manually, or even by the tooling, inserting this in the text file as 1st line.

However, this does not seem like a good idea:

feliksik commented 9 months ago

New idea: provide an option --follow-git-renames. It will use something like git log --follow --diff-filter=A -- possibly-renamed.md to determine the hash commit of where a file was introduced, and uses the ${commitHash}-${originalFilename} as the identifier of the document, in the confluence page metadata.

I think this will achieve exactly what I intend:

@zeldigas I'm not sure how much time you spend on this project, but I may get to implementing this myself when time permits. Either way, it's useful to first align on this idea.

zeldigas commented 7 months ago

@feliksik I believe that file rename is not an issue at all for page renames when you have explicit page titles - any sort of cleanup is done after all doc tree is processed, so even if file name was renamed or even moved under another location it will be processed properly

But for title renames it's challenging. You mentioned some metadata that can be associated with the page. While this can be set for sure, the main challenge would be to find this page - I did not dig deep into it, but I doubt that it's available out of the box if even possible. I see some docs, that is applied only to server version and requires server setup configuration: https://developer.atlassian.com/server/confluence/content-properties-in-the-rest-api/. And iterating over all the pages might be not a good idea at all.

That said, this idea need some research and I really appreciate your help here, as I'm not sure that withing reasonable time I'll be able to research this on my own.

Probably it's worth starting with research - if it's possible to search for page by some metadata

Another thoughts that I have - with additional constraints it might be possible to do without this search, but also with additional load on confluence: as we know parent page, we can try to fetch information about all child pages (recuresively) and use this page tree to search for renamed pages - either based on file name or based on this hash that you mentinoned

feliksik commented 7 months ago

You are spot on in your analysis. This is not my highest priority, but I'll keep you posted when I make any progress.

feliksik commented 4 months ago

I have taken a further look at the code-base, reporting for my own recollection and yours;

I think the following would be possible:

I think this should work, but it would require some refactoring. Whether this is worth it depends on how valuable you find this feature, and whether you're ok with having the logic adapted accordingly.

But I feel it would be a great feature; especially since Confluence Cloud uses page ID's so dominantly in the URL's, that text2confl page renames break the URL, but regular WYSIWYG users don't have such problem.

What do you think?