Changed some config methods to be more like build methods

ragardner / tksheet

Python tkinter table widget for displaying tabular data

https://pypi.org/project/tksheet/

MIT License

408 stars 50 forks source link

Changed some config methods to be more like build methods #182

Closed CalJaDav closed 1 year ago

CalJaDav commented 1 year ago

Hi, I've update a few of the methods in sheet.py to be more builder-like. This essentially means that they can be chained together, rather than having to be listed called seperately, like the leetcode example below

sheet = Sheet(root)\
    .set_sheet_data(data)\
    .enable_bindings(['foo', 'bar'])\
    .create_checkbox(r,c)\
    .redraw()

There are a few functions I haven't touched, but which I would be good to change. Sheet.headers, Sheet.row_index have this kind of dual functionality where they'll either set or return the headers depending on whether you specifiy a set value. This is cool, but I think it would personally be better if this functionality were handled only by the get and a set functions at the Sheet level. Generally, I am of the opinion that functions should aim to do one thing and one thing only. Lets take Sheet.data_reference for example. Currently this is just an accessor for MT.data_reference and looks like this:

    def data_reference(
        self,
        newdataref=None,
        reset_col_positions: bool = True,
        reset_row_positions: bool = True,
        redraw: bool = False,
    ):
        return self.MT.data_reference(newdataref, reset_col_positions, reset_row_positions, redraw)

So this function really does four separate things. If I provide it with data=None it will return me the sheet data, on the other hand if I provide it with data it will set sheet data. Furthermore, I also have the option of resetting row and column positions. Personally I think this is way too much and leads to hard-to-read and confusing code. Here is how I think this functionality should look from a user level:

# set_data still should reshape the table by default, but resetting functions are also provided
sheet.set_data(data, reset_table = False).reset_col_positions().reset_row_positions().redraw() 
print(sheet.get_data(get_header=True, get_index=True))

I also think that it is generally a good idea to keep redraw = False as this means that the table isn't updated until it is specified by the user, which is more performant in chained operations. I want to stress that these are changes that can be made without changing any of the background code (MT, CH or RI). tksheet.Sheet is essentially a facade for the more complex stuff in the backend, so we can have more complicated functions hiding back there but present the user with simple functions that only do a few things. In general, my recommendation is that we avoid putting confusing multi-functional, multi-output functions on the front end and front-end functions should fall into one of two categories; methods or configs that do or change something (and return self), and get or check functions that return other stuff.

Let me know what you think. I am happy to go through and refactor the whole front-end if you are satisfied with these changes, currently I have not altered any of the functionality of the functions.

ragardner commented 1 year ago

Thanks for the edits

About the redraws, it should only redraw once if there's a chain of commands with no updates in between because it checks if there's an existing call to redraw, there may be a couple of functions where this isn't the case I'm not sure

def set_refresh_timer(self, redraw: bool = True) -> None:
    if redraw and self.after_redraw_id is None:
        self.after_redraw_id = self.after(self.after_redraw_time_ms, self.after_redraw)

About get and set, I agree, I wrote those functions with many uses before I learned that it's much better to keep things simple and straight forward but I am reluctant to change them due to backwards compatibility, I am not sure

About going through the front end, I don't know how much work that involves, if you think it's not too much then I guess sure, but maybe don't change the redraws for now until I get a better look at them

I am sorry to report that there's going to be a delay on releasing the new named ranges and event data version because I decided to add the ability to move and delete non-consecutive rows/columns so I got started on that, the benefit is that it should simplify the code in these areas, the drawback is it will take a little while and it will mean a small change in the event data for moved/added rows/columns (the added will change because of the need to undo delete columns and get event data for that) but in both cases it will involve a simple list of indexes

CalJaDav commented 1 year ago

I am sorry to report that there's going to be a delay on releasing the new named ranges and event data version because I decided to add the ability to move and delete non-consecutive rows/columns so I got started on that, the benefit is that it should simplify the code in these areas, the drawback is it will take a little while and it will mean a small change in the event data for moved/added rows/columns (the added will change because of the need to undo delete columns and get event data for that) but in both cases it will involve a simple list of indexes

Not a problem, take your time. I had the distinct impression while writing my implementation of event data moved rows/cols that non-consecutive moves where going to be tricky. I'm all for moving slowly and carefully here. Named ranges will make some things easier going forward, but not some much easier that I need them right this instant.

About get and set, I agree, I wrote those functions with many uses before I learned that it's much better to keep things simple and straight forward but I am reluctant to change them due to backwards compatibility, I am not sure.

Yeah, this is always tricky but I think since we are moving towards a new version and dropping 3.6 there may be some precedent for losing a few rough edges if it makes the codebase more future-proof. We can, in the time being, flag the functions we might want to change with a deprecation warning (and post a warning for users of 3.6) on 6.xx. We can also patch in the new methods on 6.xx alongside their older counterparts, so that users can transition over before we break things. Given the amount of work left to do on 7.xx, this should give users plenty of warning about the upcoming changes. Ultimately, programmers are fundamentally lazy people and they won't start making changes until everything breaks, which is exactly what happened to my team when the new Pandas update dropped and deprecated a bunch of horrible spaghetti code which were puking out warnings that we ignored for months, but hey. You live and you learn, eh? (We don't, it will happen again).

About the redraws, it should only redraw once if there's a chain of commands with no updates in between because it checks if there's an existing call to redraw.

Huh, cool. I've stared at that function many times but hadn't put that together. There may be no need, you're a clever cookie.

ragardner commented 1 year ago

I wish I could take credit for that redraw thing but someone else made the suggestion :P

Could I ask your opinion, with the named ranges, if you move a column into a named range should the range grow by one column or push the end column out of it

Also moving a column out of a named range, should it shrink the named range or pull a column from the end into it

It's a bit complicated because what if the named range borders the end, instead of pulling a column from the end into it, it would modify a column which was initially before it (but is now in the named range after the movement)

I don't think it would be contradictory to have the named range remain static even though it shrinks and grows with deleting/adding columns

But perhaps the simplest solution, if a column is moving out of a named range then it shrinks until it is deleted when the last column is moved out, it would of course be recreated to its prior dimensions upon an undo

It seems a bit weird though that a user could just keep moving columns to 2 indexes before the named range start and it eventually gets deleted (not worried about delete columns doing this).

Perhaps in the case of moving columns it's more logical for the named range coordinates to remain static? I guess this is my gut feeling on the correct behavior...

🤔

Thanks

ragardner commented 1 year ago

Sorry lots of edits...

demberto commented 1 year ago

I think work needs to be done rather towards turning all possible get / set methods into properties. A complementary builder like Interface can then be put up such that, if x is the property then with_x is its builder alternative. This pattern should obviously be used only where its needed

CalJaDav commented 1 year ago

Could I ask your opinion, with the named ranges, if you move a column into a named range should the range grow by one column or push the end column out of it

Yeah, good questions. Named ranges were based around excel's feature of the same name so maybe we should just base the functionality on that, since it might be considered the expected behavior we are trying to replicate.

In excel, moving columns or into the range expands or shrinks the range. For example:

If we have the range A ... C and we swap A B C D -> A C D B the range becomes: A B

Equally, if we'd swapped A B C D -> A B D C the range becomes A B D C

Now, it does have some slightly odd behaviour, which we may or may not choose to replicate. If we move either end of the range A B outside the range: A B C D E F-> A C D E F B Then the range becomes: A C D E F B

However, moves WITHIN the range have no effect on its shape: If we have the range B C D and we swap A B C D E -> A C B D E the range remains: C B D

A complementary builder like Interface can then be put up such that, if x is the property then with_x is its builder alternative. This pattern should obviously be used only where its needed.

Copy that chief, I have been playing around with moving a lot of the interface functionality into properties with getter and setter arguments to wrap the subprocesses that need to be triggered when certain variables are changed, there are some things I am not yet sure how best to implement, maybe you can give me some thoughts.

Take setting sheet data as an example, I think we both agree that the best way to update an retrieve data would be through a Sheet.data property; ie.

# getting data
output = sheet.data

# setting data
sheet.data = some_array

However, we need to work in some additional functionality, for instance whether to include headers. My inclination would be that it might look something like this:

# getting data with headers and index
output = sheet.include_headers().include_index().data

# setting data with an array that includes headers/index
sheet.include_headers().include_index().data = array_with_headers

Thoughts? @ragardner @demberto

demberto commented 1 year ago

@CalJaDav I believe if a setter needs more arguments / configuration than the value itself, it shouldn't be a property to begin with. However, what could be possible is that the configuration can be done later as well? (The include_headers part?)

sheet.data = ...
sheet(.data?).include_headers = ...

CalJaDav commented 1 year ago

I actually quite like the way xlwings handles this stuff:

# getting data in xlwings
sheet.range('A1').expand("table").options().value
 # you can even do cool things like returning a dataframe...
sheet.range('A1').expand("table").options(pd.DataFrame, header=True).value

# and setting a value is handled like so:
sheet.range('A1').value = data

Perhaps this could be one way giving the sheet some additional functionality and simplify the huge number of one-off methods that all do very similar things. Specifically the implementation of a .range(cells, cols, rows, name) method that returns a SheetRange object that further operations can target, rather than seperate set_cells, set_cols, set_rows methods.

.options -> kwargs that can be passed along to setter/getter methods.

# Highlight rows
sheet.range(rows=[1,2,3]).bg = "#ffffff"

# Set a cells value
sheet.range((0,0)).value = 1

# insert a 2d array into the sheet, expanding rows/cols to account for any additional data that needs to be inserted.
# This is not currently possible in tksheet.
sheet.range((3,3)).options(expand="both").value = array

# Targeting the sheet rather than a range works as expected, and headers and index's can be handled as so
sheet.options(header = True, index=True).data = array

Again, I'm not sure if this is a more sensible pattern, it is probably what I would have done at one stage but I'd value your guy's thoughts.

ragardner commented 1 year ago

Thanks for the ideas,

My thoughts are that this seems to be a more powerful way to set and get data

As you say it could also handle the cell options such as highlighting, formatting, dropdown boxes etc

Right now I would say generally if there were to be an overhaul this is the best way to go, I have looked at the api in pandas again but I've always found it a bit confusing, I reckon xlwings api, at least for stuff related to range is much easier to understand

Of course, we could just change some of the functions into setter and getter functions and leave it at that but if we were to go with ‘range’

I am still quite unsure of the details of the implementation, some examples of which I haven’t included that many using ‘options’ sorry, and I also have no idea what’s actually possible as I don’t yet have a grasp of decorators

SheetRange object

Would .range(cells, cols, rows, name) be best or perhaps

sheet.range(
# if BOTH `r` and `c` are NOT None we're referring
# to a range of cells where `r` and `c` are zipped
# to produce coordinates, it would skip instances
# where either `r` or `c` is not an `int`

# if `r` is NOT None and `c` is None then we're 
# referring to a range of rows
r: Iterable | Generator | int | None = None,

# if `c` is NOT None and `r` is None then we're 
# referring to a range of columns
c: Iterable | Generator | int | None = None,

name: str | None = None,
)

Example

# gets cell values for rows 0 up to not including 5, column 0
# returns e.g. ["r0 c0", "r1 c0", "r2 c0", "r3 c0", "r4 c0"]
sheet.range(r=range(5), c=0).value

Or maybe having to provide two iterables (or an int and an iterable) for cells would be error prone?

Would name override r and c in the case of getting data, as you're getting data from a named range e.g.

# returns the data from the named range "myrng" NOT rows 1, 2, 3
sheet.range(r=[1, 2, 3], name="myrng").value

Would it also work with putting table, index and header in the range object?

sheet.range(r=1, c=[1, 2], index=True, header=True).value
sheet.range(r=[1, 2], table=False, index=True).highlight(**kwargs)

Could also put expand into range?

sheet.range(0, 0, expand="both").value = array

expand and data setting / inserting

For setting data using expand would it push data like an insert or overwrite existing data like a paste

Or would this maybe be reserved for insert

e.g. I'm not sure what would happen here

# a bit messy or okay using so many range words
sheet.range(r=range(2), expand="down").value = [[1, 2], [3, 4]]

# insert columns 0: [1, 2], 1: [3, 4]
sheet.range(c=[0, 1], expand="right").insert([[1, 2], [3, 4]], widths=[120, 120], deselect=True, )

Named ranges

Creation

# a named range for rows 1, 2, 3 where all cells have a dropdown
sheet.range(r=[1, 2, 3], name="myrng").dropdown(**kwargs)

# a named range for rows 1, 2, 3 where all cells have a dropdown
sheet.range(r=[1, 2, 3], name="myrng").options(**kwargs).dropdown()

Adding options to a named range

The same as creation but without the r arg, overwrites any existing of the same options

# refers to cells within named range “myrng”
sheet.range(name="myrng").dropdown(**kwargs)

# a named range for rows 1, 2, 3 where all cells have a dropdown
sheet.range(r=[1, 2, 3], name="myrng").options(**kwargs).dropdown()

Deleting a named range

Would this get confused with clearing values?

sheet.range(name="myrng").delete()

Deleting existing dropdowns from a named range

Unsure if this is correct syntax

sheet.range(name="myrng").dropdown.delete()

Cell options generally

Not sure which of the below would be best or if both would work

sheet.range(r=[1, 2], c=[1, 2]).highlight().delete()

sheet.range(r=[1, 2], c=[1, 2]).bg = "green"
sheet.range(r=range(10), table=False, index=True).align("global", redraw=True)

Deleting rows/columns

Not sure if this would even be a thing because it implies you could also delete cells which would pull the cells below the range up and the cells to the right of the range left

# delete columns 0 and 1
sheet.range(c=[0, 1]).delete(deselect_all=True, redraw=True)

A lot to consider ._.

CalJaDav commented 1 year ago

My thoughts are that the .range() is only a selection method. That is, you can use it to select and operate on certain parts of the table but it does only that. It can produce either a CellRange, a RowRange or a ColumnRange, with a named range being a reference to any one of those, but you can only select ONE type of range at a time:

# Selecting a cell range
sheet.range(cells = [(0,0),(0,1),(1,0),(1,1)]) # -> CellRange

# Selecting a row range
sheet.range(rows = [1,2]) # -> RowRange

# Invalid! Cannot select more than one type of 
sheet.range(cells = [(0,0),(0,1),(1,0),(1,1)], rows = [1,2]) # -> ValueError!

Named ranges will also come in three flavors (row, col, cell) and can be created something like:

sheet.range(cells = [(0,0),(0,1),(1,0),(1,1)]).name = "foo"

# and retrived like
sheet.range(name="foo") # -> CellRange

Ranges should also allow you to select areas between points, I'd probably suggest something like:

sheet.range(cells = "0,0:1,1") # = sheet.range(cells = [(0,0),(0,1),(1,0),(1,1)])
sheet.range(cols = "0:4") # = sheet.range(cols = [0,1,2,3,4])

# and maybe even using negative notation for selecting to the ends
sheet.range(cols = "0:-1") # -> All Cols

This will simply the coding a lot, since CellRanges act on cell_options, RowRanges act on row_options, etc. We already do this, pretty much all the operations on tksheet have these three flavors of operations anyway. It just will mean that we get to split out the interface into more manageable chunks.

With regards to expand, I think it probably should remain outside .range() although I'm not too sure either way yet. Basically I see it as the solution to the reset_col_postions arguments that currently sit in set_sheet_data, basically telling it whether it can insert data or expand the table beyond the currently selected range like so:

Generally speaking, I don't think we should encourage expand being used in cases where a selection would work just as well, i.e.

# We prefer this
sheet.range(cells="2,2:2,-1").clear()
# To this
sheet.range(cells=(2,2)).expand("right").clear()

There is obviously a lot to think about here, if we go ahead with this it does represent a pretty major change to your codebase, but it is one that could tidy up the library a lot.

demberto commented 1 year ago

I have been working on Column and Treeviews for quite some time on a project of mine. I was thinking if we could do something that separates the model from the view. This is a path offered by all popular UI frameworks. It won't be too much work, but its API needs to be designed carefully. Luckily we have a few existing implementations we can take ideas from. There's Qt's QStandardItemModel, Gtk has its own models. The model can raise virtual events as its data changes. If Tksheet wants to implement a multi column treeview in the future, an existing model can be reused. This mechanism can be orthogonal to the existing all-in-one implementation. I, unfortunately am not very knowledgeable about Tksheet's internals and don't know if its a worthwhile idea with the current implementation.

ragardner commented 1 year ago

My thoughts are that the .range() is only a selection method. That is, you can use it to select and operate on certain parts of the table but it does only that. It can produce either a CellRange, a RowRange or a ColumnRange, with a named range being a reference to any one of those, but you can only select ONE type of range at a time

I like this approach, thank you!

I am hesitant to use the word range in this way because of the builtin, would area or span be okay?

Ranges should also allow you to select areas between points

I could put a converter in for the optional of using strings

"A1:A3" # cells (0, 0), (1, 0), (2, 0)
"1:3" # rows 0, 1, 2
"A:C" # columns 0, 1, 2

I am nearly finished with the internals for named ranges (I think) so I can basically start work on the api soon, I will consider the specifics of syntax relating to the existing arguments like reset_row_positions as you say

I have been working on Column and Treeviews for quite some time on a project of mine. I was thinking if we could do something that separates the model from the view. This is a path offered by all popular UI frameworks. It won't be too much work, but its API needs to be designed carefully. Luckily we have a few existing implementations we can take ideas from. There's Qt's QStandardItemModel, Gtk has its own models. The model can raise virtual events as its data changes. If Tksheet wants to implement a multi column treeview in the future, an existing model can be reused. This mechanism can be orthogonal to the existing all-in-one implementation. I, unfortunately am not very knowledgeable about Tksheet's internals and don't know if its a worthwhile idea with the current implementation.

I am pretty ignorant on such topics, but... I have already finished work on the new event data stuff which is emitted with a tkinter event when the sheet is modified by the end user, also a sheet redrawn event is emitted when it's redrawn (will push this code and docs to a branch soon)

I guess with getters and setters and a more simple way of creating and deleting the cell options stuff the internals will be more separate and everything easier to manage if more changes are made though

Once the api is more or less done I will go back to adding back in the relevant type hinting you started on also, I am slowly understanding it ._.

CalJaDav commented 1 year ago

Shifting towards a greater seperation of roles is probably something to shoot for in the medium term. I think model/veiw pattern would probably look something like this for tksheet:

The view handles drawing all the canvas items and spawns events that inform the model about user changes. The model handles all the backend stuff related to moving columns around, editing data, determining the visible areas of the table, etc. Currently this functionality is all smooshed together inside the four main classes.

CalJaDav commented 1 year ago

I am hesitant to use the word range in this way because of the builtin, would area or span be okay?

I think span is the best alternative.

I could put a converter in for the optional of using strings

Yeah, the string converter should probably support alpha-numeric excel-like cell coords and pythonic integer index coords.

I'm going to start work on a AbstractSpan class tonight and implement it in a CellSpan. For now, I will not be removing/modifying any existing features from the API. It will be developed alongside the existing code for now. I'll have a demo for you guys to play with in a few days and give feedback on.

Closing this PR for now, lets move this discussion back to PR #176 or the TODO.