Adding on-the-fly derivers

horner commented 9 years ago

I am attempting to add on-the-fly driver in the UI. screen shot 2015-01-17 at 6 28 16 am

The coffescript: btns.append $("

horner commented 9 years ago

Ok, I figured out a solution by adding a thisOrig to the pviotUI function.

    $.fn.pivotUI = (input, inputOpts, overwrite = false, locale="en") ->
        ....
        thisOrig = this

And then change the button callback to:

   btns.append $("

It seems to work, but I don't know if this is leaking memory in a bad way.

nicolaskruchten commented 9 years ago

This is a neat idea! I'm not sure if this will leak but if it works for you, awesome :)

horner commented 9 years ago

I am preparing a pull request, but before I do I wanted to make sure that the on-the-fly derivers could be persisted in the opts object for use without the UI. Initially, I just added the functions to the opts.derivedAttributes, but the functions cannot be serialized to JSON and used later. I changed my code it to make a new opts.derivedAttributesMacros that has a format:

 "derivedAttributesMacros": {
    "Create Date Day": {
      "format": "%m-%d-%y",
      "src": "Create Date"
    },
    "Create Date Month": {
      "format": "%m",
      "src": "Create Date"
    }
}

Then in the constructor, the code create the actually entries in derivedAttributes with the deriver functions. I plan on making other on-the-fly derivers formats other than % based date ones.

I am not married to the name derivedAttributesMacros. In fact, I don't like the length for the title, but I could not think of a better one.

Do you have a preference? Also do you like the approach of creating another object or should we combine them into a single object with both functions and format specifiers?

nicolaskruchten commented 9 years ago

Would these macros somehow pattern-match to attribute contents or would the mapping of macros to attributes be done in the initial function call? I.e. if I have two date fields T1 and T2, would I just be able to specify "activate these macros for dates" or would I have to specify "activate these macros for attributes T1 and T2" ?

horner commented 9 years ago

I thing making them for each attribute (T1,T2) manually because I don't think the usual use case requires that every date be derived in all possible ways. For example someone may want T1 (which could be a date/time) to be derived to just day (dropping time) and T2 to be by month, so rather than derive all the dates with with a single formula, I think it is better to be explicit about it, hence the "src" value.

So:

 "derivedAttributesMacros": {
    "T1 Day": {
      "format": "%m-%d-%y",
      "source": "T1"
    },
    "T2 Month": {
      "format": "%m",
      "source": "T2"
    }
}

nicolaskruchten commented 9 years ago

In that case isn't it just easier to use derived attributes to begin with, without adding all this overhead to allow users to create them within the UI?

horner commented 9 years ago

The problem is persisting the opts with the functions since they cannot be serialized in JSON.

I have a modification of http://nicolas.kruchten.com/pivottable/examples/onrefresh.html that allows the opt object to be named and shared among several users using the same page by storing it server side. Its very cool because it behaves like a shared filter in a google spreadsheet where one user can update the view and all users using the same named opt see the changes.

I don't know of another way to securely share the opt between browsers using javascript. By making a serializable derived attribute spec, it can be shared and persisted for use later.

horner commented 9 years ago

Check out https://github.com/horner/pivottable/commit/71a6d7ecc57a413af5ba189c19e93989fc381859?diff=unified.

It has the on-the-fly derivers. Let me know what you think. I can prepare a pull request if you like.

nicolaskruchten commented 9 years ago

This is very cool, but honestly I don't think that I would accept the PR into the core of this project, as the use-case seems fairly narrow for the added complexity, especially as compared to just adding the deriver right away in source.

horner commented 9 years ago

ok, maybe I am not explaining the use case well enough. Without this change a end-user with data containing date columns must know javascript to analyze the results using different groupings.

Worse yet, a programmer would be wasteful to build a deriver for every date for every combination. For example one person may want to see dates by month, while another user may want to see data compiled by quarter. It seems wasteful to compute both and all the time. This gets especially worse if you have more than one date too.

Also, I have been leaning towards fixing the natural sort speed issue by instead allowing an on-the-fly deriver that reverses dates from MM/DD/YYYY to YYYY/MM/DD and then leaving the simpler natural sort take over. There could also be a on-the-fly deriver to change DD/MM/YYYY to YYYY/MM/DD which would solve the problem of i18n.

The idea would be to declare in the opt structure that a column is a date and then pivotUI/pivot could know to re-arrage the column to work with the natural sort. The only drawback to this approach is that I think end-users prefer to see MM/YYYY over YYYY/MM in the final column headings.

Anyway, I hope that explains the idea better. If you have not checked it out, I can setup a jsFiddle with this code so you can see how intuitive it is to use. Also, to see how important having these macros would be without having to resort to javascript.

nicolaskruchten commented 9 years ago

Oh I definitely understood the idea, but it seems like even with your proposal, the javascript call would have to be hardcoded to the field names to allow for these macros to apply, right?

horner commented 9 years ago

No, there would be no need to hardcode javascript to make use of the on-the-fly deriver macros. I made a quick video demo so you could see it in action: https://www.youtube.com/watch?v=JaltyHAH6gs

Take a look. I think it is important to note that I don't think end-users would manipulate JSON, but that a JSON object should be sharable (without javascript) to allow the pivotUI to show derived columns.

Furthermore, I think the code that is in the "onrefresh" demo that removes "bulky default values" and "deletes functions" should be standardized into the pivotUI. For example we should consider making the onRefresh function return a sanitized config, or at least provide a function in PivotUI to sanitize the object for storage and retrieval.

Let me know what you think.

nicolaskruchten commented 9 years ago

OK, for the purposes of this conversation let's split up "users" into "Developers" and "End Users" . Developers write software which causes Javascript to be executed in the browsers of End Users, who interact with the resulting pivot table widget.

My understanding is that for an End User to gain access to the new UI elements you are proposing for a given attribute X, a Developer needs to hardcode the name of attribute X in the Javascript which invokes the pivot table. Please correct me if I'm wrong but I believe I asked about this earlier on in the thread and you responded in the affirmative (see the exchange about T1 and T2).

Let's compare this to the situation today, where a Developer could just as easily use the derivers mechanism to create derived attributes up front, which the End User would have immediate access to, without having to use the popup menu and clicking on buttons. This is how I as a Developer have created UIs for my End Users: when attribute X is a date, I suppress attribute X and add derived attributes for year, month, and month-and-day.

So as I see it, your proposal allows me as a Developer to pass along the decision of whether or not to create derived attributes to my End User, for attributes which I still have to pre-specify in source, much as if I was creating the derived attributes myself. Unfortunately, I don't really believe that this additional freedom is worth the additional complexity in this library.

If I've misunderstood, please do correct me though! Your YouTube video is not accessible to me unfortunately (I get a "this video is private" message)

horner commented 9 years ago

Whoops. Sorry about the video (https://www.youtube.com/watch?v=JaltyHAH6gs). I just made it public. Also, you can play with my changes here: http://horner.github.io/pivottable/examples/onrefresh.html

After playing, you might get a better idea.

nicolaskruchten commented 9 years ago

OK, I've watched the video and I did in fact misunderstand.

My question now is: how does your code determine which attributes get the new buttons if not via configuration? Is there some type-detection where the pivot table now tests attributes values to find out if they are dates?

horner commented 9 years ago

oh, yeah... right now I detect if there is a date in the column by looking at the first value in the column (see: https://github.com/horner/pivottable/blob/dev/pivot.coffee#L634) , but that could be explicit (via another option) too.

horner commented 9 years ago

Oh yeah, and honestly there are really two issues here:

1) On-the-fly derivers for dates 2) Storing the config for later use. As the comment in the example onrefresh (http://nicolas.kruchten.com/pivottable/examples/onrefresh.html) is really not possible: "Instead of showing the configuration in a textarea, one could store the configuration in a cookie or implement a 'save configuration' feature." Especially if the config has script in it.

1) could could/should be resolved by specifying that a column is in a specific date format: MM/DD or DD/MM or other. Or could be detected with a preference based on i18n knowledge. This approach would also solve the natural sort problem too. For example, if the column is known to be a date, the natural sort for that data could be biased to date, whereas the cheaper comparison could be used in all other cases. 2) needs to be fixed with the macro approach... tho as I stated earlier, I am not happy with calling them a macro. ;-(

PeteW commented 7 years ago

If I am understanding this the derived attribute macro is to providing pivottablejs the intelligence to understand the data types and provide custom filters. If so then for situations like jupyter pivottable where I cannot/prefer not to code the filter types this would certainly empower the user to perform more powerful ad-hoc analytics.

nicolaskruchten commented 5 years ago

Some extra thoughts on this issue, with a few years of perspective :)

Type hinting would really help! See #191
The idea of a generic JSON-serializable deriver system is quite a good one
The idea of onRefresh providing a JSON-serializable config is also a good one (it could do the stripping of renderers, aggregators, sorters etc internally)

nicolaskruchten / pivottable

Adding on-the-fly derivers #280