Open wyrmmm opened 10 years ago
I got a .wayback file for each year in the csv file. (somehow I have missing .wayback files. The number of wayback files do not correspond to the number of lines in the csv file)
What is this .wayback file? How do we see what it contains?
Dear all: You can find several classes for data in the module, "waybackmachine.py", like PageNode and PageData. The file with ".wayback" is a Pickle object (i.e., serialized binary data). Find PageData class and examine the function, "constructNodeExportPickle()". In order to examine .wayback, you may want to write the following:
import pickle file_name = "any.wayback" f = open(file_name) from waybackmachine import PageNode anyWaybackPageNode = pickle.load(f) f.close() print anyWaybackPageNode.url
Do not rush. Take your time for examining source codes. I expect you guys to fix several issues and errors in the half-baked library. Also, the console application (haha) is a mere example to test waybackmachine.py . You may learn about class relationships from it.
Hi Prof,
I'm not sure what option 2: collect main pages yearly does. I ran it, and I gave it an input file path, but what's the output it's supposed to return?