lightswitch05 / table-to-json

Serializes HTML tables into JSON objects.
http://lightswitch05.github.io/table-to-json/
MIT License
756 stars 172 forks source link

Prefilter callback on property name #16

Closed station384 closed 6 years ago

station384 commented 9 years ago

One issue I have come across is when a table header has an odd name example 'Col 1 / ID'

this is valid it would render as {'Col 1 / ID': 'someValue'}

but parsing it is where it becomes an issue. It can be used in JavaScript by calling it as var col1 = parsedJson['Col 1 / ID']; but you can't call it as var col1 parsedJson.Col 1 / ID;

This issue is even more pronounced in other languages, C#, Java, Tcl.

The person could filter the JSON text and clean the property names, but that would be a lot of logic that would have to be implemented, or done in a regex that is beyond me.

What I propose is adding a new option. if the option is null or undefined, the current behavior continues. But if there is a function object, we call that function and pass in the string that is going to be set as the property, the return value from the function is what we use as the property.

something like this.

tableToJSON(
{ 
   callbackFunction : function (data) { return data.replace('/','').replace(' ','_' ); }
});

This gives the option to the end user to filter text, process it, map it if they want.

I can implement it.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/4712466-prefilter-callback-on-property-name?utm_campaign=plugin&utm_content=tracker%2F228299&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F228299&utm_medium=issues&utm_source=github).
lightswitch05 commented 9 years ago

This goes along the same lines as the proposed solution for #14. But how do we allow users define these textExtraction / formatting functions? Accepting a single callback would not be useful, different columns contain different data. So, we'd have to allow a callback per-column. But also, what if someone only wanted a callback for one specific cell that is different then all the others?

If you read #14, @Mottie suggests doing something like this where the callback is defined using a selector. I really like this because it allows the user to be as flexible or rigid with the callbacks as they like. Before looping through the table and finding values, we could store the user's callback function in all the cells via the data method:

for (index=0; index < callbacks.length; ++index) {
    table.find(callbacks[i].selector).data('tableToJsonCallback', callbacks[i].callbackFunction);
}

When getting the values of each cell, we could check to see if it has a tableToJsonCallback data attached to it, if it does, we could send the the entire cell contents to it - maybe even the header name & cell index, and use whatever it returns as the value:

    var cellValues = function(cellIndex, cell) {
        var value, result;
        var override = $(cell).data('override');
        if ( opts.allowHTML ) {
            value = $.trim($(cell).html());
        } else if ( typeof $(cell).data('tableToJsonCallback') === 'function' ) {
            value = $(cell).data('tableToJsonCallback')(cell, cellIndex);
        } else {
            value = $.trim($(cell).text());
        }
        result = notNull(override) ? override : value;
        return result;
    };

If you are willing to add this feature & tests for it, it would be a very powerful feature

station384 commented 9 years ago

I'll take a closer look at what tablesorter is doing. but on initial inspection, it appears that it would assume that the developer would be able to add the data-attribute to the table. there in full control of the data.

There are 2 scenarios tableToJSON can be used that I can see.

Scenario 1: developer controls everything. Scenario 2: developer does not control the table, but can only passively read it.

The scenario I am approaching from is #2 scraping data from a site I do not control, so I would have 2 options in this case.

Use a mapper - this would be an object passed in as an option that defines the column maps, column name, column data filter functions, data types. the downside is this is rigid, you don't expect the column names to change or the order of the columns (this will most likely fail in my scenario).

Use a generic filter callback on every cell that is processed. Its up the developer to decided what to do with the data.

under scenario 1, data-attribute is perfect you can have different callbacks a different one for each cell. under scenario 2, there can be only one, unless were using a mapper.

There seems like there should be a way of merging this so processing of both scenarios (table under dev control and not under dev control) can be handled.

lightswitch05 commented 9 years ago

@station384 if you are able to run jQuery within the page, then you should be able to make use of jQuery's data method. If you read about it here, they say all data values are then stored internally in jQuery. So you are not actually modifying anything - just caching the callback in a jQuery data store.

station384 commented 9 years ago

I'll poke with it and see what I can come up with.

lightswitch05 commented 9 years ago

@station384 If you are still looking into this- I've create a new branch called 'rewrite' where I've re-written all of TableToJSON. The only part I don't have working yet is colspan support.

If you are wanting to do to callback functionality - this rewrite is the branch to do it on. If you don't want to do it, I probably will.

station384 commented 9 years ago

I'll switch to this branch.

I've been caught up in work, I haven't had a chance yet to implement it.

On Sep 29, 2014, at 10:11 AM, Daniel notifications@github.com wrote:

@station384 If you are still looking into this- I've create a new branch called 'rewrite' where I've re-written all of TableToJSON. The only part I don't have working yet is colspan support.

If you are wanting to do to callback functionality - this rewrite is the branch to do it on. If you don't want to do it, I probably will.

— Reply to this email directly or view it on GitHub.

station384 commented 9 years ago

I didn't forget about this. My job has me tied up with other items right now that I can't put any time into it. I will be circling back shortly.

lightswitch05 commented 9 years ago

@station384 no worries, I've been tied up too and haven't been able to get the colspan working correctly yet

siamkreative commented 9 years ago

+1 for adding this to a great little plugin :+1:

Mottie commented 9 years ago

Actually, now that I've read this issue again, it's not about using the textExtractor. I thought that the headings option is used to replace header text and name the "key" in the key:value pairs of the JSON.