vallettea / koala

Transpose your Excel calculations into python for better performances and scaling.
GNU General Public License v3.0
143 stars 60 forks source link

Tests for utils.py #219

Open Brad-eki opened 4 years ago

Brad-eki commented 4 years ago

In the refactor branch I have been adding tests for utils.py.

While adding tests I have been marking up with TODO comments. These comments are largely so I can come through later and address them. There are some strong themes which might be better addressed by a single TODO at the top of the file and attempting a campaign to attack them. eg; logging, docstrings, expanding one or two letter variables names to something more descriptive.

Outside those themes, there is a small but growing collection of features we might like to have in our utils. Essentially edge cases which are not currently supported.

danielsjf commented 4 years ago

@Brad-eki, I saw that some of your commits (in the refactoring branch) dropped a lot of tests. Was this on purpose and what was the reasoning? Will you at some point also do a pull request for all the changes because at the moment it is hard to check the commits one by one. I saw you have done a few commits straight in the master and since they were small that is fine. However, if you change larger parts (like this refactoring), it would be good to have a pull request with all the changes. Preferably in functional blocks that are review-able ;-)

To be clear, I'm not questioning your changes and I can only encourage what you are doing! Thanks a lot for all your contributions. This will only improve koala. However, for a big refactoring like this, there is always a risk that things change (like what happened to me) so I would say two pair of eyes are always better than one :-)

danielsjf commented 4 years ago

It also seems that there is still something off with the latest test of the commits in the master. 2.7 fails (which we would deprecate anyways), but 3.6 fails as well.

Brad-eki commented 4 years ago

@danielsjf thanks for the observations :D

The changes I have committed directly into master had been put up for PR a few times. Each time we had in-principle agreement, so I thought it was safe to put them straight in. I accept it wasn't an ideal approach. I do admit I've not put any direct effort into fixing the broken tests. From now on I'll be sure to only use PRs on master.

I guess I have started using the refactor branch as a personal branch. I'm OK to change this behavior if it's not appropriate. Also happy to PR into it rather than push directly if that's more useful to those around me.

In the refactor branch I have intentionally removed the existing tests. I don't intend for their coverage to be gone forever but I found it a little easier to work from a clean slate in a linear and methodical way. I don't intend to have them land in master without a PR.

The last couple of days I have been trying to identify good ways to express my intent for Koala in the short, medium and long term so that the plan can be socialized. Maybe the project feature of github...?

Mixed in with all of the above, I have launched an open source project which integrates Koala with xlwings (called FlyingKoala) and am hoping to have it as a signature project for "advertising" consulting. With my project as being a string motivator I would very much like Koala to move from a proof-of-concept/beta to a beta/release candidate (if that makes any sense). To my mind that may entail a very big restructure, clean up, comprehensive test coverage and better doco. A very big undertaking. With that I am happy to do some very long yards in moving Koala but, equally, want to enable those around me in a similar pursuit while achieving the goals they want from Koala, potentially lower the bar for approaching some of the things we would want fixed, all that while not distracting or dis-empower anyone else.

From what I can see with writing the tests for utils, Koala could be on the verge of a very big change. In particular - there is an object used everywhere in Koala which gets called cellmap. It is an instance of Spreadsheet. On the surface it seems that cellmap could benefit by being an object to itself, an instance of which could be an attribute of Sparadsheet. Or is Spreadsheet more appropriately a CellMap?

In any case, I intend to continue methodically writing tests for each function and method we have in Koala. Doing the easy ones first and making notes on the ones that are too hard at the time for someone to come back to at a later time. I expect to PR the tests into master per file eg; utils then Spreadsheet then ExcelCompiler, etc...

danielsjf commented 4 years ago

@brad-eki, you're doing a great job in moving Koala forward which is really appreciated. Indeed I've also come across this cell map. To me there is actually a deep duplication behind it. A lot of the logic is in the AST code and another large part of the code depends on the cellmap. Somehow I think we should actually merge these two to really make new developments for Koala easy. Very often issues arise from a misalignment between the two. For instance, every time something is updated, this has to happen both via AST (checks with hierarchical cells), but also by updating the cellmap. I faced many bugs in the code due to this duality.

Indeed I think it will be difficult to take on this big refactoring without having an army of tests to make sure that we don't break anything. Therefore all the work that you are doing now is incredibly valuable.

However, what I did notice is that many of these bugs go deeper than just simple function tests. Often they arise in more complicated interactions between them. At the same time I find the tests with the excel files rather complicated and untransparant. The current test structure always is split in one part of code and one part of a binary excel file. I think we could use a more structured way to test functions in reality in excel. Maybe we could write the formulas in the test code but have a few helper functions that write them using openpyxl to excel. As a com object we could then execute them. This would allow us to always verify exactly the real excel behaviour with the koala behaviour.

Anyway, I agree that using the project mode in Github could be a great way to further structure the work. I haven't used it yet myself, but I can only encourage you to try it out.

Brad-eki commented 4 years ago

@danielsjf thanks :D

In my exploration which lead to PR #221 I've seen you doing a massive amount of work. It's great. And I'm learning a lot from you. Thanks.

I'm 100% on board with the concept that test coverage will assist the refactoring. I found while writing the tests for utils.py it becomes easy to identify which code is operating, which isn't and which could benefit from operating differently. Trouble is, as you've mentioned, you don't get a sense of how it's all interacting from this perspective.

I am yet to actually "do the science" and determine this for fact but I think, if we get the architecture right, Excel files will not be required parts of the tests beyond testing the reading of an Excel file.

Here's why I have a gut feeling on this:

The punchline of Koala is that "Excel isn't required". For me, when I look at what Koala promises, once an Excel file has been read we are no longer in the realm of Excel or even Spreadsheeting for that matter. In fact there's no real requirement for an Excel file to be involved at all.

At the heart Koala is an AST for Excel formula syntax and a way of mapping the cells and their relationships. The connection to an Excel file is tentative at best and could be considered a simple convenience for accessing cells and their relationships.

Why can't people express math and other (Excel related?) concepts in Python alone? Excel formulas are only text and are expressing the relationship between cells. It's the constellation of cells and their relationships which is the bit we are actually interested in. At the moment using Excel is just a convenient GUI we can use to build the constellation of cells (a "model"?)

I'd argue if we have written Koala in a way where it needs an Excel file, we have missed the mark.

Considering;

Koala is so close to much of the above. It has just evolved in a way that makes it hard to maintain and add features. At risk of using contentious terms - currently Koala doesn't appear to be loosely coupled or highly cohesive (in a way that I understand those terms).

Anyway, off track...

My view: Insisting on an ability to write tests that exercise the full gamut of Koala which can be run without an Excel file in sight is actually at the very core of the journey I see for Koala.

I'll see about starting something in github project. Let's see what trouble we can cause using that.