lightswitch05 / table-to-json

Serializes HTML tables into JSON objects.
http://lightswitch05.github.io/table-to-json/
MIT License
756 stars 172 forks source link

Row ID as property #15

Closed station384 closed 9 years ago

station384 commented 9 years ago

Tables a lot of the time will have an ID set on the row of the table that is a unique identifier (JqGrid). I have added the option to allow for extraction of the ID attribute, and add add it as a property to the JSON.

Example of usage for this in a scraping scenario Generate the initial JSON. Interate the result find the row with the ID of x. click row.

lightswitch05 commented 9 years ago

get the tests passing first and then I'll consider merging

lightswitch05 commented 9 years ago

Thank you for getting the tests to pass. If I decide to accept this feature, it will need a few tests as well. Why do you call the id __id__?

station384 commented 9 years ago

I was looking for something that was unique, something that would not likely be a column header, and I didn't want to write the checking code to see if a column name already existed. So id was used as a temporary holder as the likelihood of someone using that as an ID attribute value on a row would be low.

It should be changed, I just couldn't think of anything better. Suggestions? I can change it while I'm adding in the test cases.

Edit: Get rid of my sleep answer, replace with one when brain was turned on. Update: Changed Code to allow something other than id

lightswitch05 commented 9 years ago

This is looking a lot better. Thank you for adding a test. As for the name... I like it being configurable. I don't like __id__ being the default since it doesn't mean anything to anyone and will pretty much always require overriding. Also, I don't care much for having two new options to control one new feature.

How about having only one option, includeRowId. If set to true, then you include the row id like you already are, except using the name rowId, since that is exactly what it is. I imagine a lot of people would like having it named rowId. However, you are right that it could easily clash with existing column names, so its got to be easy to override.

I propose that the exact same option includeRowId can be used to override the default name. It could take either a boolean or a string. If its a boolean and is true, then it uses the default name. If its a string then it uses whatever that string is. It would of course default to false like you already have it.

A final note. It looks like you are setting the row id to be an empty string if there is no id on the row. I don't think that makes since, if it doesn't exist, it should be null.

If you agree with my requests I'll merge this. If not, lets keep discussing until we find the best solution. I think this can be a useful feature and I'm thankful for you putting the time and effort into making it.

Thanks!

station384 commented 9 years ago

I agree. I like the joined options and just checking object type for action.

update: made the changes. null cannot be implemented right now, as placing the value null, the value will be treated as a column to skip. there is currently no way of identifying the column action i.e. if it is a skip, or it should be a null value. It treats all all nulls as skip.

The result of this is since the column is not a skip, the header is there, but when it gets the row value it skips it so all values get shifted left.

lightswitch05 commented 9 years ago

This has been merged and released in 0.9.0