Closed eruizalo closed 8 months ago
Using show sounds good, and custom lambda for comparison sounds reasonable provided we keep the current default comparison.
I'm not so sure about JSON based comparison that sounds slow/brittle to me but could be wrong.
Hi @holdenk,
I'm not sure if I explained myself properly and I had the features almost finished, so I think it is better to show them to you.
I tried to split this issue into 4 commits, each commit should contain only a possible feature, so if you don't like any of them it can be removed or updated individually. I may also create different issues/PRs if you prefer that:
show()
without any param, so big columns are truncated. With a custom show function we may do: df.show(50, false)
or df.toJSON.show(false)
(this is what I meant when talking about json/big structures). Now is a dev decision how to show the differences and how many (default behaviour remains)Please, tell me what you think about these features and whether I should separate them into new issues/PRs or if they require any changes.
Ah that makes more sense sorry for misunderstanding part of the first issue. This sounds reasonable do you have a PR up I can review?
Hi @holdenk, I just created #405 let me know your insights
Fantastic I'll try and take a look either tomorrow or early next week :)
Hello @holdenk , I've noticed that since commit 6f812cc9, the way of displaying DataFrames that are not similar has been changed using the
assertDataFrameApproximateEquals
method. The new approach involves a show, using the default parameters (truncating the text and limiting the number of rows to 20).I would like to be able to parameterize this "show" in some way so that at least the
truncate
andnumRows
parameters may be provided to theassertDataFrameApproximateEquals
method. Another option could be to use a lambdaDataFrame => Unit
param (default lambda would keep current functionality) so that we can customize it per project. This way, we could even convert the DataFrame into JSON if we have complex structures and properly see which fields are not equal.I've been working on this and other contributions (but I'm not sure if do you want an issue for each feature):
What do you think?