Initialize it to 1 divided by the number of nodes currently in the graph.
For the timestamp format we could do MMdd_HH:mm:ss or something similar. I think as long as we all know what it is, it doesn't really matter that much.
I'm working on the part where we send the crawling team which links to crawl next right now and I think that is probably the best place for us to check it. My thought is that I will first check if the link is disallowed and then if it is not I will call a function of the graph to get the information for that node. Then I will check if enough time has passed since the timestamp.
For the first question, slightly change to what @cam626 said above. Since the average of all the rank should be 1 if all the URLs have outgoing link, we should simply initialize the rank to 1.