Open IMayBeABitShy opened 2 years ago
+1 for this. I actually made a clone of this website using httrack about a year ago and it was an ORDEAL! Would much rather this be in a zim file for my kiwix server. On a side-note the image of 173 is going to get a redesign in the near future to avoid this issue.
downloading it via zimit seems like a viable option. Nearly everything on the website is CC-SA, the only exception (?) being the image of SCP-173, but excluding that one should be easy when using zimit
@IMayBeABitShy Have you tried using the limited version of zimit already? did it work?
I had a cursory look but cannot see whether this is mediawiki-based or not.
Following @Popolechien's suggestion, I've used youzim.it to create a limited ZIM of the site. It seems like the website works (obviously some stuff like the search doesn't, but zim files have their own search functionality anyway). I did, however, noticed that a lot of junk javascript has been included (e.g. cookie confirmation, ...).
I suggest also excluding the following sites:
This list is probably incomplete, but this should be the most important ones on the main page.
I had a cursory look but cannot see whether this is mediawiki-based or not.
I don't think it is. There is a wikidot -> mediawiki conversion tool, which also indicates that it's not a media wiki. Still, I only have superficial knowledge of wiki software, so I may be wrong.
@IMayBeABitShy Awesome, I've started a recipe. Let us see what happens.
I think this one failed. I've checked the log a couple of times and zimit seemed to spend a lot of time parsing some background pages (like workbench
I think they were called). The last time I've checked, the job was finally interrupted.
Looks like the favicon URL has changed. New URL: https://scp-wiki.wikidot.com/local--favicon/favicon.gif Also, the recipe log is flooded with these errors. I unfortunately am not familiar enough with zimit to know what this means.
[2023-07-02 17:23:23,192: WARNING] failed to load progress details: Expecting value: line 1 column 1 (char 0)
We can also omit the copyright concern with scp-173 image as this has been removed from the site to adhere to CC BY-SA license.
Another update to this request, the attempt on December 29, 2023 was successful! The resulting ZIM was usable, however, it looks like the depth needs to be increased by at least one.
https://farm.openzim.org/pipeline/6cc5755f-e0de-4a4c-a22f-fa9e43a0603f
Articles listed on the homepage are indexed but the majority of articles are under the series page that are just too deep.
I noticed something very strange ...... all the offset pages are not being crawled correctly. Also, since the site uses Crom search, I think *.crom.avn.sh should be added to the exclusion list as well.
@Popolechien can you reopen this issue or update the recipe for this?
Just so everyone is on the same page the latest version available is at https://dev.library.kiwix.org/viewer#scp-wiki_en_all
As far as poking at the zimit recipe goes I'll defer to @benoit74
@lbrunkho @MCSeekeri I'm sorry but I don't get what your issues are.
Can you please provide link to a page with a non-working link (and details about this non-working link, e.g. position on the screen, text, screenshot, ...) so that I can understand what you are speaking about?
@lbrunkho @MCSeekeri I'm sorry but I don't get what your issues are.
Can you please provide link to a page with a non-working link (and details about this non-working link, e.g. position on the screen, text, screenshot, ...) so that I can understand what you are speaking about?
SCP-2998 The "Next iteration" at the bottom jumps to /offset/1 Zimit is not crawling correctly, it seems to be because the page returns 503.
{"timestamp":"2024-09-25T13:49:45.047Z","logLevel":"error","context":"general","message":"Page Crashed on Load","details":{"status":503,"page":"https://scp-wiki.wikidot.com/scp-2998/offset/1","workerid":0}}
There are also some issues that don't exist in the current zim file. I found them while crawling SCP-CN. SVG and MathJax The crawled version doesn't render SVGs correctly and doesn't display math formulas correctly, which is probably due to Wikidot's weird front-end implementation, so both of these issues can be left alone for the time being.
If the page returns a 503, unfortunately there is nothing we can do ... But here the message says "Page Crashed on Load", so I suspect there is another issue. Will have a look when time will be available to work on this ZIM request.
If the page returns a 503, unfortunately there is nothing we can do ... But here the message says "Page Crashed on Load", so I suspect there is another issue. Will have a look when time will be available to work on this ZIM request.
The strange thing is that the page doesn't actually return 503, the content is normal, I'm not sure why there is this output ......
zimit
, it may be necessary to specify something like--exclude "SCP-173\\.jpg"
.As the majority of this website (the exception being the 'random article'-buttons, login functionality and search) does not seem to need any backend whatsoever, downloading it via
zimit
seems like a viable option. Nearly everything on the website is CC-SA, the only exception (?) being the image of SCP-173, but excluding that one should be easy when usingzimit
. I am not even sure if it even needs to be excluded.