hallowelt / migrate-confluence

Tool to migrate content from Confluence export files into a MediaWiki compatible import source
GNU General Public License v3.0
35 stars 8 forks source link

Structure of the Confluence XML export seems to have changed #81

Closed HyP3r- closed 6 months ago

HyP3r- commented 1 year ago

I am not sure but it seems that the structure of the XML export of Atlassian Confluence has changed.

In the first step, there is no option to export the page as XML either in the space under the Export item or in the site. For this purpose, there is now the "Backup" section. Here you can create an XML file.

Many different details like users or spaces can still be extracted from the backup, but not the content of the pages anymore. In the XML the term bodyContents does not appear once.

This command never returns anything: https://github.com/hallowelt/migrate-confluence/blob/93acee0334a5e40a6dc7c51005b1dbafed9297e0/src/Analyzer/ConfluenceAnalyzer.php#L467

Is it the same for you? Am I doing something wrong? I'm using version 8.3.2, which is the last version that worked on-premise. All other versions now have a cloud constraint. Maybe that's why Atlassian changed the format of the XML again?

Damme commented 11 months ago

I think I have the same problem, have not looked any deeper into this yet but convert command yeilds empty wiki pages except for the attachments.

Tedderouni commented 8 months ago

I'm having this problem as well with the empty pages. Exported from Confluence Server 8.5.5 (which is now the last on-prem version) and migrate-confluence 1.2.1 (current latest release).

Looking at my entities.xml, I also don't see bodyContents, but I do see a lot of objects with a class of BodyContent, among others, and these actually do appear to contain the page content. I don't have an earlier version to compare to, so I don't know how it compares to the bodyContents or if it was all within that tag.

E.g.:

<object class="BodyContent" package="com.atlassian.confluence.core">
    <id name="id">123456789</id>
    <property name="body"><![CDATA[body content here]]></property>
    <property name="bodyType">2</property>
    <property name="content" class="Page" package="com.atlassian.confluence.pages"><id name="id">987654321</id></property>
</object>
Tedderouni commented 8 months ago

While digging into this today, I found that I had installed pandoc incorrectly and it wasn't in my $PATH. Once I fixed that and reran the migration and import steps, it worked as designed. So at least in my case, the issue was user error, and I now can confirm that this works successfully with Confluence Server 8.5.5 and migrate-confluence 1.2.1.

@Damme My symptoms matched what you described, so this might be something for you to check if you haven't already.

Damme commented 8 months ago

I forgot this issue, I managed to get it working with the commits a few weeks ago.