zikato / blogcomments

0 stars 0 forks source link

Scary Scalar Functions - Part Two: Performance - StraightforwardSQL #17

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

Scary Scalar Functions - Part Two: Performance - StraightforwardSQL

Continuing the series, we'll examine how Scalar functions are terrible for performance.

https://straightforwardsql.com/posts/scary-scalar-functions-performance/

CarstenSaastamoinen commented 2 years ago

Maybe this articles should be destroyed!!!!!! They are evil!!!!! If I want to insert a book in a table with ISBN10 and/or ISBN13 we ALL know the rules for this numbers. And of course we insert at least 10000 books at the same time - never less! A really bad test! I've seen the same way of arguing about Tally, which you also use. With Tally, we never create less than 1,000,000 rows! When we have to form a hotel bill it is maybe fewer rows and then Tally is the wrong way. Should we form the 12 months of the year Tally is very bad ! In the same way with UDF. You obviously prefer speed over accuracy. Accuracy/correctness in data far exceeds performance when inserting and updating.

List all the rules for ISBN without looking them up - it can only take a moment, less than a minut. And ask all your colleagues about the rules - they know them.

If that type of test is to be taken seriously, then test with realistic amounts of data! And then a UDF becomes the right choice - due to correctness !!!!! Ask your user about correctness contra performance. I know the answer. Bad data can/must be thrown away and the system destroyed. Bad performance 7 times a day when inserting a new book is absolutely not an issue. And I will like to see a solution without UDF that insure a valid ISBN. And if we are evaluating the values when inserting and updating it is NOT necessary to use the function the times - many - when selecting the data.

So destroy the articles and use UDF the right way and in the right situations.

andowero commented 2 years ago

Proper context needs to be considered.

If you run a bookshop and need complicated UDF to check ISBN rules, you probably will use an UDF no matter what, since you won't be adding tens of milions of new books every hour. The simple lookup example has execution time of 12 for non-UDF version and cca 900 for UDF version. This should not concern you if this extra processing happens over the course of the whole year. Even if one row took 10 seconds more, it could still be palatable.

On the other hand, if you use some simple UDF, i.e. to get minimum of two numbers (very similar to DoNothing example) which you will than use everywhere, this could, according to this article, slow down your system massively and visibly. It could be the difference between being able to process 1 and 20 million rows per minute.

CarstenSaastamoinen commented 2 years ago

To andowero!!!!

Yes, but the article tells that UDF should be destroyed and are evil!!!!!!!!

AND THAT'S WRONG!

As always there are many situations where we can use different ways for solving a problem. UDF is possible wrong if it is a simple example but absolutely the right way if it is a complex problem!!! And performance is never more important than correctness!!!!!

So, UDF should never be destroyed and is absolutely NOT evil! But should be used in there right situation.

I see so many articles on the internet where it is told, that there is only one way to solve a problem. Tally is one of them!!!!! But all possibilities in SQL can be the best way in some situations/for some solutions. And UDF is the best way for evaluation of a complex rule for having valid and correct data not matter have long time it takes to insert or update data. If people are following this articles they make bad code and bad solutions and even useless solutions!!!

CarstenSaastamoinen commented 2 years ago

Now I have read the third article in this serie. The head line is

Scary Scalar Functions

but should be

Better having useless data than using UserDefined Scalar Function.

If we should remove all functionalities in a language that can be misused by people who are not able to program, writing SELECT, .... the whole language should be removed. People should be educated to use UDF and BE ABLE TO evaluate realistic. Nearly all examples are in the articles are useless - so I am awaiting the solution for evaluating ISBN10 and ISBN13 without using an UDF - because it must be the only and the most correct way!!!!!! And at the same time we can expect a solution for removing to many blanks in the middle of a string when inserting or updating data - of course they should be removed or it will be impossible to search later!!!!! I am waiting!!!

zikato commented 2 years ago

@CarstenSaastamoinen: When you're a hammer, every problem looks like a nail. If you only know UDFs, then at least use a staging table for the integrity checks and move it into the final table. That way, you won't prevent parallelism in every query that touches it.

Also, work on your communications skills because I've seen lesser tantrums from 10-year-olds.

CarstenSaastamoinen commented 2 years ago

It looks as you only have a hammer because you are not able to see that an UDF can be used. And about communications : evil, destroyed. I see a lot of articles that try to tell people one way to make a solution because they only have a hammer!!!! So you maybe also know that distinct always order the data, that tally is always the fastes way for creating a table with numbers, that there is a difference writing COUNT(*), COUNT(1) and COUNT(PK). And many more articles where the conclusion is wrong, because people writing this articles only have a hammer! So maybe DISTINCT, COUNT, ... are evil and should be destroyed!

It is a problem trying to learn others something wrong because you only have a hammer!!!

zikato commented 2 years ago

You're welcome to find your information elsewhere or start your own blog. I wish you the best of luck.