holdenk / spark-testing-base

Base classes to use when writing tests with Spark
Apache License 2.0
1.52k stars 358 forks source link

assertDataFrameApproximateEquals params show when failing #404

Closed eruizalo closed 8 months ago

eruizalo commented 9 months ago

Hello @holdenk , I've noticed that since commit 6f812cc9, the way of displaying DataFrames that are not similar has been changed using the assertDataFrameApproximateEquals method. The new approach involves a show, using the default parameters (truncating the text and limiting the number of rows to 20).

I would like to be able to parameterize this "show" in some way so that at least the truncate and numRows parameters may be provided to the assertDataFrameApproximateEquals method. Another option could be to use a lambda DataFrame => Unit param (default lambda would keep current functionality) so that we can customize it per project. This way, we could even convert the DataFrame into JSON if we have complex structures and properly see which fields are not equal.

I've been working on this and other contributions (but I'm not sure if do you want an issue for each feature):

What do you think?

holdenk commented 9 months ago

Using show sounds good, and custom lambda for comparison sounds reasonable provided we keep the current default comparison.

I'm not so sure about JSON based comparison that sounds slow/brittle to me but could be wrong.

eruizalo commented 9 months ago

Hi @holdenk,

I'm not sure if I explained myself properly and I had the features almost finished, so I think it is better to show them to you.

I tried to split this issue into 4 commits, each commit should contain only a possible feature, so if you don't like any of them it can be removed or updated individually. I may also create different issues/PRs if you prefer that:

Please, tell me what you think about these features and whether I should separate them into new issues/PRs or if they require any changes.

holdenk commented 9 months ago

Ah that makes more sense sorry for misunderstanding part of the first issue. This sounds reasonable do you have a PR up I can review?

eruizalo commented 9 months ago

Hi @holdenk, I just created #405 let me know your insights

holdenk commented 9 months ago

Fantastic I'll try and take a look either tomorrow or early next week :)