spotfiresoftware / spotfire-python

Package for Building Python Extensions to Spotfire®
Other
18 stars 6 forks source link

Consider making Column inputs from a Python DF numpy arrays instead of pandas.Series #3

Open bbassett-tibco opened 3 years ago

bbassett-tibco commented 3 years ago

https://stackoverflow.com/questions/66148445/why-isnt-my-new-column-being-named-correctly-when-using-a-python-data-function

Customer observed that column inputs retain their column name from the original input even after transformation.

This differs from how the equivalent TERR data function works. It has to do with how we internally handle a Column input in Python vs TERR. In both Python and TERR we pass inputs (and outputs) over as a table. In TERR's case a data.frame and in Python's case a pandas.DataFrame. In TERR's case though, if the Data Function says the input is a Column we actually convert it from a 1-column data.frame to a vector of the equivalent type, similarly for a Value we convert it from it's 1x1 data.frame to a scalar type. In Python, for Value inputs we do that as well, but for Column inputs we leave it as a pandas.Series which retains the column name from the original input column.

We could probably do something different there. We wouldn't want to convert it to a standard Python list (because in that case, x2*2 would actually make the column twice as long, rather than a vectorized arithmetic operation). But I suppose we could make it a straight numpy array instead (the equivalent of adding x2 = x2.to_numpy() at the start of the user's example). I'm not sure if that would make an unnecessary copy of the data though. It doesn't look like it does (it's just a reference to the underlying data), so that might be a better approach overall, and might be more what customers are expecting.

REPRO:
Create a data function like:

output = input * 2

Define both Output and Input as numeric Columns, and hook input up to a column from a Spotfire data table.
Set output to go to a new data table
Expected: column in new Spotfire table is named "output"
Actual: column in new Spotfire table is named the same as the original input column

Issue migrated from TIBCO Software JIRA [PYSRV-260] created by jorobert