Closed ppwadhwa closed 2 years ago
I still need to review the code, but I pulled this branch down, tested this a little, and it mostly looks great! Some observations.
I'll hopefully take a look at the code later today, or tomorrow. As this is vendored code, I think it would be useful to include a link to the original source and to try to clarify where we have made significant changes.
Hi @ppwadhwa thanks for working on this, seems to be working as expected :)
I have some comments/suggestions on updates:
QDialog
. but something that inherits from QWidget
_
.trans_
helper for now, we can think if we need to translate the index
word, and how we can do that from within napari.editor.setOptions({"background": 12, "show_colors": True})
or similar. Thoughts @tlambert03 ?Looking good. I agree with all of @goanpeca's comments. I do think we should think carefully about the API, giving each public method some scrutiny, possibly renaming.
Also, would want to see complete tests, and probably an example or two, before this gets merged. We've generally been very careful and thorough with testing in superqt, so it's possible that this vendored code might not work in one or more of our test setups
another general comment here. I think if we're going to put this in superqt, we should "own" it so to speak, and not put it in _vendor
(though do include the license and attribution at the top of the file). But feel free to change the API as desired, with no aim of being mergeable with upstream changes.
Also just want to mention: @jni and I had a rather lengthy discussion about Table/Dataframe widget APIs over in https://github.com/napari/magicgui/pull/61 (which we were imagining at the time would accomplish something like this). So it might be worth a read through some of what was discussed over there and – if not terribly difficult – try to have some degree of parity between that one and this one.
@tlambert03 will take a look!
looks like tests need pandas to run. Chatted with @goanpeca about this. Let's add an extra to setup.cfg
called table
or dataframe
or something, and add the pandas dependency to it. Then add pandas
to the testing requirements as well
Merging #57 (07218e0) into main (329eaaa) will decrease coverage by
3.16%
. The diff coverage is73.08%
.:exclamation: Current head 07218e0 differs from pull request most recent head fc8d744. Consider uploading reports for the commit fc8d744 to get more accurate results
@@ Coverage Diff @@
## main #57 +/- ##
==========================================
- Coverage 84.58% 81.42% -3.17%
==========================================
Files 24 28 +4
Lines 2108 3009 +901
==========================================
+ Hits 1783 2450 +667
- Misses 325 559 +234
Impacted Files | Coverage Δ | |
---|---|---|
src/superqt/dataframeeditor/_dataframeeditor.py | 72.98% <72.98%> (ø) |
|
src/superqt/__init__.py | 100.00% <100.00%> (ø) |
|
src/superqt/dataframeeditor/__init__.py | 100.00% <100.00%> (ø) |
|
src/superqt/sliders/_labeled.py | 79.79% <0.00%> (-2.87%) |
:arrow_down: |
src/superqt/fonticon/_qfont_icon.py | 92.70% <0.00%> (-0.73%) |
:arrow_down: |
src/superqt/utils/_qthreading.py | 77.05% <0.00%> (-0.68%) |
:arrow_down: |
src/superqt/utils/__init__.py | 100.00% <0.00%> (ø) |
|
src/superqt/utils/_throttler.py | 90.00% <0.00%> (ø) |
|
src/superqt/utils/_misc.py | 100.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 329eaaa...fc8d744. Read the comment docs.
@tlambert03 I know I need to check into the failing tests--I will do that. Did I update the setup.cfg
file properly?
@tlambert03 @sofroniewn @jni @goanpeca
Hi! I wanted to get some feedback regarding the API for the dataframe editor. Below are some public methods that are either already implemented or I thought would be good to implement.
Could you let me know what you think of these and if there is some functionality you think is missing?
class QDataFrameEditor(QWidget):
# Signal emitted when value of dataframe changes.
valueChanged = Signal(pd.DataFrame)
def __init__(self, data: pd.DataFrame, parent: Optional[QWidget] = None, **kwargs) -> None:
"""Creates a widget with Pandas dataframe."""
def getValue(self) -> pd.DataFrame:
"""Return current dataframe object."""
def setData(self, row, col, value):
"""Set a value in the dataframe viewer at a certain row and column location."""
def addRow(self) -> None:
"""Add row to the dataframe viewer."""
def addColumn(self) -> None:
"""Add column to the dataframe viewer.""" (not implemented yet)
def deleteRow(self) -> None:
"""Delete row from the dataframe viewer.""" (not implemented yet)
def deleteColumn(self) -> None:
"""Delete column to the dataframe viewer.""" (not implemented yet)
def enableBackgroundColor(self, value: bool) -> None:
"""Colors background in columns based on scale of values in each column"""
thanks @ppwadhwa ... couple questions:
valueChanged = Signal(pd.DataFrame)
- will this collect and emit the entire data frame every time any single cell changes? If so, are there any performance concerns there? I wonder if we might want to have a more granular signal, something like a signal with the row, column, and new value. (see magicgui table change event) setData(row, column, value)
for every item in the row/column?addColumn(header, data)
deleteRow/deleteColumn
don't take inputs... was that just an oversight? or would the always delete a specific/laster column/row?insertColumn(index)
/ insertRow(index)
?one of the bits of the magicgui table API that I was most happy with and that we discussed for a while is the __getitem__
, __setitem__
, __delitem__
methods here. This allows for numpy-like indexing and editing onto the 2D data array, For example table.data[0, 2]
gets the data in the cell of the first row, 3rd column. table.data[:10, :2]
gets the first 10 rows in the first 2 columns, etc.... Also, these lines here let us get/set/append/delete columns with table['header'] = [...]
.
(I'm sorry to keep coming back to that table PR in magicgui, but I would like to have a solid understanding of what is being gained here in comparison to that existing implementation, so we can best take advantage of it, as well as what is being lost here compared to that existing table editor, hopefully so we can maintain what already works well).
thanks!
thanks @tlambert03 for your comments/questions.
I hope others will still weigh in on this, but I thought I'd share what I think/know...
As far as what is different about this widget compared to the magic gui widget---My understanding is that this one has multi-indexing support and lazy loading for larger dataframes. For display, it has color coding based on scale and easy formatting of float values.
Rather than dropping low-level/line comments on this WIP PR (which I can also do if desired), maybe I can give more context around how I expect napari to use the widget here or alternatively magicgui.Table
. I appreciate that's not the only goal here, but I see it as the primary one.
In summary, one advantage of the contribution here is that it assumes the tabular data is a pandas.DataFrame
. That's a little risky for Layer.features
(see below), but it should have some performance benefits - particularly with regards to copying data. Add that to the performance benefits of the promised lazy loading here, and this PR becomes enticing when Layer.features
is big.
The widget here also has some extra functionality compared to magicgui.Table
and some missing functionality that could be implemented in this PR or in follow-ups.
My gut instinct is that we should initially use magicgui.Table
to implement https://github.com/napari/napari/issues/3512, as long that doesn't introduce any behavior that we couldn't easily implement with the widget here. I think that task is relatively simple, though we have to think about the graphical design of how to open the features table associated with the layer. Then when we see how that behaves and how it is received, we could more easily motivate the widget here if it's lacking or non-performant. How does that sound?
More details/notes below!
Layer.features
getter may not return a pandas.DataFrame
in the futureThe public API of the Layer.features
getter doesn't guarantee that it returns a pandas.DataFrame
, only a dataframe-like. That's not well defined at the moment, but we might aim to define a LayerFeaturesProtocol
(in a similar spirit to LayerDataProtocol
) that is implemented by some of the dataframe-like libraries (e.g. pandas, cudf, vaex, etc.). Right now, the underlying implementation actually uses a pandas.DataFrame
, so that's what is returned by the getter.
Of course we can rely on that implementation detail for now, but I'd be worried about depending on too much of the very large pandas.DataFrame
API. We should try to limit ourselves to the basics like shape
, __getitem__
, __setitem__
, __delitem__
, iloc
, and maybe iat
that are likely to supported by most implementations or would be easy to wrap/implement ourselves.
Layer.features
setter coerces the input and flattens a multi-index on the rowsWe also do some coercion in the setter, including flattening the row-index to use the simple/default RangeIndex
. We might want to support fancier row-indices (i.e. labels and multi-indices) in the future, though that may not be compatible with some of the design decisions of the Data APIs thoughts on a dataframe protocol, which we might rely on. Therefore, support for multi-index rows is not an immediate priority. Would be interesting if column multi-indexing is supported by the implementation in this PR, though that may also not play nicely with current thoughts about a simple dataframe protocol (instead a tidy-data structure is probably the safe play here).
Right now, magicgui's Table
will in general make a copy of the underlying data Layer.features
because when the input is a pandas.DataFrame
it calls pandas.DataFrame.values
, which only creates a view of the underlying array when all the columns share the same dtype, and when that dtype is supported by numpy (i.e. it's not a pandas extension type like CategoricalDtype
).
I think the only concern here is the performance of initializing a table with duplicate data, mostly in terms of memory usage. I suspect in practice that won't be a big concern as I expect Layer.features
to be small in comparison to Layer.data
, but maybe I'm wrong.
this appears to be inactive. Will close soon unless instructed otherwise.
Adds a dataframe editor widget that uses code from Spyder's dataframe editor. I tried to pull out just the code that I thought we needed here and tweaked things to make it work. Take a look and let me know what you think!
This PR addresses to https://github.com/napari/napari/issues/3512. I'm not entirely sure if this is the location we want to keep it, but I wanted to get it somewhere for viewing.
Here is a short script to test it out yourself:
https://user-images.githubusercontent.com/54282105/150072182-c01f42e6-671e-440b-ac1f-19d75ba8d3d8.mov