lightswitch05 / table-to-json

Serializes HTML tables into JSON objects.
http://lightswitch05.github.io/table-to-json/
MIT License
756 stars 172 forks source link

Ignore rows #19

Closed dayAlone closed 9 years ago

dayAlone commented 9 years ago

Options for ignore row (data-ignore) and ignore empty rows

lightswitch05 commented 9 years ago

Hello @dayAlone, thanks for the pull request! We've had some tickets before discussing ignoring rows, but no one has ever submitted any code.

Before I merge this, I would like to talk a little about its use-case and how this would be better than other available options.

  1. Why do you need to ignore empty rows? (Why even have empty rows in the table?)
  2. If you know the index of the row, why not just remove it from the resulting array?

     noEmptyRows = fullTable.splice(3, 1);
  3. Would it be better to supply an array of indexes to ignore? Like ignoreColumns works, except ignoreRows?
  4. Would it be better to add a HTML attribute to rows that should be ignored? data-ignore="true"?
station384 commented 9 years ago

@lightswitch05, I can see a use for this in scraping sites I have no control over.

My thoughts on your questions:

  1. Bad design of table, using a table for mixed layout and data. this is common on insurance and banking sites i've worked on.
  2. Scraping I have no clue what the data is or what would be empty (workaround: analyze the table dom and find out which row is empty)
  3. Supplying the array of indexes to ignore falls on the premise that you know what the data is. (workaround: requires analyzing the table before coversion)
  4. I've no control over the markup. (workaround: look for rows that are not visible and add prop data-ignore="true" to those rows)

all the workarounds require additional logic (and some at a fairly high cost due to interacting the dom, which will now need to be done 2x once for pre and once to conversion) at the onset of post to work.

Just an argument for the addition in one scenario.

lightswitch05 commented 9 years ago

@station384 great points, I'll merge parts of this (don't want the version change stuff) and implement in the rewrite code

lightswitch05 commented 9 years ago

This has ben released in version 0.10.0. Thank you @dayAlone for the pull request!