shensjw / LSPT-LinkAnalysis

Link Analysis part for search engine project in LSPT course
0 stars 1 forks source link

Initial values for node class variables #1

Closed shensjw closed 5 years ago

shensjw commented 6 years ago
  1. Since for each node there is a rank parameter, what should we initialize it?
  2. In which format do we want to store the timestamp parameter?
  3. How are we actually gonna check the timestamp?
cam626 commented 6 years ago
  1. Initialize it to 1 divided by the number of nodes currently in the graph.
  2. For the timestamp format we could do MMdd_HH:mm:ss or something similar. I think as long as we all know what it is, it doesn't really matter that much.
  3. I'm working on the part where we send the crawling team which links to crawl next right now and I think that is probably the best place for us to check it. My thought is that I will first check if the link is disallowed and then if it is not I will call a function of the graph to get the information for that node. Then I will check if enough time has passed since the timestamp.
RioMichael commented 6 years ago

For the first question, slightly change to what @cam626 said above. Since the average of all the rank should be 1 if all the URLs have outgoing link, we should simply initialize the rank to 1.