Describe your use case and the problem you are facing
When limited by max_file_size, sites with a large quantity of "before_posts" data will produce a series of WXR files that could in theory be 100% meta and never reach the point of exporting posts. In the more egregious cases I've seen, 20-50% of each WXR file is taken up with this duplicate metadata.
I would like to work on code to de-deduplicate metadata amongst generated files to save space and processing time and to facilitate the export of extremely large sites. I've filed this as a feature request to see if this use case/new option would be compatible with the export-command.
Yes, I'm open to adding deduplicating logic like that. We can add it as a flag at first and then later decide what the default state of that flag should be as regards to BC.
Feature Request
Describe your use case and the problem you are facing
When limited by max_file_size, sites with a large quantity of "before_posts" data will produce a series of WXR files that could in theory be 100% meta and never reach the point of exporting posts. In the more egregious cases I've seen, 20-50% of each WXR file is taken up with this duplicate metadata.
Before posts:
Describe the solution you'd like
I would like to work on code to de-deduplicate metadata amongst generated files to save space and processing time and to facilitate the export of extremely large sites. I've filed this as a feature request to see if this use case/new option would be compatible with the export-command.