kwhitehall / scored

3 stars 6 forks source link

Deliverable 2 Initial Code for Review #13

Closed shashank1994 closed 8 years ago

shashank1994 commented 8 years ago

Updated with JSON

kwhitehall commented 8 years ago

Hey @shashank1994, @mich326, and @kushaank this is a great first attempt! Great news is (drum roll) it more or less works :+1: So, let's clean it up now.

  1. Please read up on classes (http://www.tutorialspoint.com/python/python_classes_objects.htm). We want to make this a class with all these methods, as they are associated with the same journal.
  2. Also, add a doc string for each method
  3. +1 on the try blocks as it relates to populating the JSON. It would be prudent to also put in some try blocks in some of the other methods for graceful exit incase for e.g. a path isn't found.
kwhitehall commented 8 years ago
  1. A method to create a seedlist (seed.txt) that is updated during the crawl. The seedlist should contain a url per a line.
lewismc commented 8 years ago

REST API? https://github.com/chrismattmann/nutch-python

kwhitehall commented 8 years ago

@lewismc can you please expound?

lewismc commented 8 years ago

The Python API enables injection of URLs via the REST API. If you wish to add URLS, then all of this stuff can be done RESTfuly. I can come and get more context from you later today @kwhitehall

kwhitehall commented 8 years ago

Yes @lewismc, you're jumping the gun on the next deliverable. Re this one, getting the seedlist is sufficient. And yeap... we should chat.

chrismattmann commented 8 years ago

Yes use rest API nutch-Python ftw