scrapinghub / exporters

Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
BSD 3-Clause "New" or "Revised" License
40 stars 10 forks source link

FSReader: add support for files and lists of files #267

Closed immerrr closed 8 years ago

immerrr commented 8 years ago

Right now FSReader only allows dir+pattern way of looking up inputs and sometimes to parallelise things we want to fetch the exact files matching that and distribute them evenly across subprocesses running actual exporters code.

I was thinking about the following way to accept input (i'm writing in yaml, but please imagine the same in json):

input: /path/to/file   # specify one file

input:  # specify a directory, like before
  dir: /path/to/dir  
  pattern: foo.*bar

input:  # specify a list of files
  - /path/to/file1
  - /path/to/file2

input:
  - dir: /path/to/dir1
    pattern: foo.*bar
  - dir: /path/to/dir2
    pattern: qux.*quux

So the idea is simple:

What do you think?

eliasdorneles commented 8 years ago

This sounds good to me. I have a minor concern about backwards compatibility, but since exporters is not yet open sourced (will be in a couple of weeks) and you've been the early users of FSReader, if you're happy with changing the API, we're happy too. :)

bbotella commented 8 years ago

Same here. Let's do it!

eliasdorneles commented 8 years ago

Closed by https://github.com/scrapinghub/exporters/pull/268