Reorganize and refactor

tamouse / scrapers

Web site scrapers using Mechanize and other goodies.

MIT License

4 stars 0 forks source link

There are several times I've been working in this repo that I think "wouldn't it be neat if ..." and have been a little stymied because it would have meant I'd need to restructure things. I've also learned quite a bit since I've started this collection, and many parts could use a face lift.

Structure Revision

The directory structure is haphazard, and the way command line bits are implemented is inconsistent.

Concept

bin/: The executables will contain very, very minimal code, basically instantiating a command line processor and passing the ARGV array to it. Example:

#!/usr/bin/env ruby
require 'scrapers/rubytapas`
Scrapers::RubyTapas::CLI.new(ARGV).start

lib/: Pretty much the entirety of the code will reside under lib/, which makes things rather easy to test.
lib/scrapers/: Contains the various scrapers and their CLI components, sorted by scraper.

Example:

lib/
  scrapers/
    rubytapas/
      scraper.rb
      cli.rb

Non-scraper-specific things will be in lib/ for easy name-spacing. Example, the .netrc reader:

lib/
  netrc_reader.rb

require 'netrc'
class NetrcReader
  # ...
end

lib/ scrapers/ rubytapas/ cli.rb -- class implementing the thor script bits dpdcart.rb -- class implementing the dpdcart gateway object scraper.rb -- class implementing the feed scraper episode.rb -- class implementing an episode structure file_list.rb -- class implementing the download file list

tamouse / scrapers

Reorganize and refactor #4

Structure Revision

Concept