Open davidar opened 8 years ago
I think someone might have written such tool, crawling a website + adding to ipfs, but I am not sure anymore. Would be great if someone can comment. It might just also be a concatenation of wget+ipfs add
.
There are already several more or less mature tools that allow users to download entire websites and store them locally on a hard drive. Perhaps the initial focus should be on figuring how to insert these local web site mirrors into the ipfs network.
Once that part of the problem is solved and stable, additional development resources can go towards developing the component that downloads existing website structures, and immediately uploads them into ipfs without the need of a local cache or copy.
TL;DR: People should be able to simply run:
without having to worry about copyright violations, etc.
There are several open-access collections that could be archived by simply spidering their website, in the same way that Google Cache or IA's Wayback Machine does. Of course, this should only be performed for the portions of the website not disallowed by
robots.txt
.IANAL, but from what I can tell, this is all kosher so long as there's an appropriate procedure for opting out. According to this article (which links to this document), Google is safe because they allow webmasters opt out via
robots.txt
, and also has a process for responding to DMCA takedown requests.This is the policy that the Internet Archive follows:
Anyway, it would be really helpful if IPFS had an official procedure regarding this (presumably gateway-dmca-denylist would be a part of this).