Open nickhuang99 opened 5 months ago
After submitted, I realized this is a duplicate of old issue https://github.com/openzim/zim-tools/issues/190 Can someone dup it or should I close it? Maybe just another test cases in future?
escapeSlash.diff.zip I have a simple solution to solve this issue by escaping all '/' in path of filename to allow all article/picture residing in same level of directory to avoiding this directory-filename-conflicts. Even though this may create potential filename longer than 255 issue when directory is too deep. However, my tests show at least for "https://download.kiwix.org/zim/wikipedia/wikipedia_en_computer_maxi_2024-05.zim" we no longer have any single exception.
Attached please find patch. Can developer to take a look to see if it can be patched if ok. Thank you.
Let's use "real" wiki page of "C++" as an example: "https://en.wikipedia.org/wiki/C++" is a html page and it has some sub pages under the directory "C++": https://en.wikipedia.org/wiki/C++/CLI This situation cannot be represented in dump static html files because "C++" cannot be a html file and directory at same time. How should zimdump generate redirect is not so easy, especially when "C++" is url-escaped as "C%2B%2B". Then redirect URL has to include a safe-URL encoded to check if actual filesystem directory "C++" actually exists. To give a real test case, please download "kiwix" computer zim from: https://download.kiwix.org/zim/wikipedia/wikipedia_en_computer_maxi_2024-05.zim
And when you zimdump it, you will see "C++" html page is missing because "C++" is a directory to hold "CLI" page in filesystem.