marklogic / marklogic-contentpump

MarkLogic Contentpump (mlcp)
http://developer.marklogic.com/products/mlcp
Apache License 2.0
34 stars 27 forks source link

Mlcp exports binary documents with XML content of other documents #83

Open jensriga opened 6 years ago

jensriga commented 6 years ago

Situation

I am using mlcp to export all documents of a database to the local filesystem. In the end, I have the correct number of local files, but some files that should be binaries actually contain XML content from other documents. The XML documents themselves are okay.

Steps to reproduce the issue

  1. Unzip the content of import.zip into a local folder C:\Temp\mlcp\import : files
  2. Use mlcp to import the files into an empty database: mlcp.bat import -host localhost -port 8070 -username **** -password **** -mode local -input_file_path C:\Temp\mlcp\import -output_uri_replace "/C:/Temp/mlcp/import,''"
  3. Observe content of database in Query Console: in_ml For comparison with later results and to make sure everything is still okay after the import I used XQuery to determine the size of all documents: for $doc in fn:doc() let $uri := fn:document-uri($doc) let $size := if (fn:exists($doc/binary())) then xdmp:binary-size($doc/binary()) else xdmp:binary-size(xdmp:unquote(xdmp:quote($doc),(),"format-binary")/binary()) order by $uri ascending return $uri || " -> " || $size size_in_ml Everything looks good so far.
  4. Use mlcp to export all documents to the local filesystem: mlcp.bat export -host localhost -port 8070 -username **** -password **** -mode local -output_file_path C:\Temp\mlcp\export
  5. Compare import and export directory: comparison The XML documents and 5 out of 8 binary documents are okay. The problem is, that image-003.gif and image-008.gif now have to same content as doc-A.xml and image-007.gif has the same content as doc-B.xml.

My system environment

jensriga commented 6 years ago

Maybe this helps: I was able to reproduce the issue on a clean CentOS 7 VM with a new installation of MarkLogic Server.

The only significant difference: under Linux all 8 binary files are broken, not just 3 out of 8 like under Windows 10

linux-compare

mattsunsjf commented 6 years ago

Good bug report!

dbarriguete commented 5 years ago

import.zip Hello, I have reviewed this situation and I have a minor change into "Export-binary-bug" branch, with this change export brings the correct file content to binary and text files.

Attached to this comment is a zip with more files for testing purposes.