lonekorean / wordpress-export-to-markdown

Converts a WordPress export XML file into Markdown files.
MIT License
1.07k stars 216 forks source link

Fetching images breaks ("Something went wrong") #85

Closed dertuxmalwieder closed 5 months ago

dertuxmalwieder commented 1 year ago

Everything went well on a server that does not have my cb.apply() problem... until the download of images should have started:

Done, got them all!

Something went wrong, execution halted early.
URIError: URI malformed
    at decodeURIComponent (<anonymous>)
    at Object.getFilenameFromUrl (/home/mindfuck/.npm/_npx/7588/lib/node_modules/wordpress-export-to-markdown/src/shared.js:2:9)
    at /home/mindfuck/.npm/_npx/7588/lib/node_modules/wordpress-export-to-markdown/src/writer.js:108:28
    at Array.flatMap (<anonymous>)
    at /home/mindfuck/.npm/_npx/7588/lib/node_modules/wordpress-export-to-markdown/src/writer.js:107:30
    at Array.flatMap (<anonymous>)
    at writeImageFilesPromise (/home/mindfuck/.npm/_npx/7588/lib/node_modules/wordpress-export-to-markdown/src/writer.js:104:25)
    at Object.writeFilesPromise (/home/mindfuck/.npm/_npx/7588/lib/node_modules/wordpress-export-to-markdown/src/writer.js:12:8)
    at async /home/mindfuck/.npm/_npx/7588/lib/node_modules/wordpress-export-to-markdown/index.js:26:2

My WordPress started in 2005, some images are "invalid" indeed now. But that should not break the script.

dertuxmalwieder commented 1 year ago

Analysis: WordPress accepts image file names with "%" in them, but the script will quit. Solution, for now:

// writer.js:108

// ...
                const imagesDir = path.join(path.dirname(postPath), 'images');
                return post.meta.imageUrls.flatMap(imageUrl => {
                        console.log("\nTRYING IMG URL " + imageUrl); // ADD THIS LINE
                        const filename = shared.getFilenameFromUrl(imageUrl);
                        const destinationPath = path.join(imagesDir, filename);

Then, wait for the crash, go to the WordPress backend and rename the file from the console... seems to fix it!

lonekorean commented 5 months ago

An image URL with improper encoding involving % would cause the exception you saw when decoding was attempted. Fixed so that if the URL can't be decoded, it won't, and just continue running.

Not sure how your image URLs ended up that way, but stranger things have happened. 🙂

Thank you! Fixed in v2.3.7