markrmiller / solr

An experimental branch of Apache Solr, serving as a reference for various performance, scale, and stability improvements.
https://solr.apache.org/
Apache License 2.0
1 stars 1 forks source link

The Solr Collection Creation Challenge Contest #1

Open markrmiller opened 3 years ago

markrmiller commented 3 years ago

9765976 The Collection Creation Contest is on Live! Intro out, progress to follow.

https://user-images.githubusercontent.com/448788/113466268-177e5c00-9400-11eb-96f4-06993a085ca4.mp4

markrmiller commented 3 years ago

So no big update yet. We are not quite ready to fire off collections like a gun slinger yet

Essentially, the Jetty thread pool was ham-stringed - but that was often made up for by other threads not being in check - and a more recent improvement / change that helped expose this issue again, due to a bug in the implantation, made it worse. Which was nice, because that led to other offenders.

The problem with this dynamic is that it doesn’t always work out optimally, and sometimes, it was just horrendous for Jetty. And would have been more and more of an issue at larger and larger scale. It’s pretty amazing the numbers you could still pull out - but that just means lots left on the table

markrmiller commented 3 years ago

https://user-images.githubusercontent.com/448788/113469742-bd889100-9415-11eb-99a1-7683f1adf5d7.mp4

markrmiller commented 3 years ago

So the reason I got so addicted to this point in the processes - a point very, very deep in, even when I got there in less time, is that the whole build up, I immersed myself in he code and behavior. I learned a million things, I tried a million things, I solved a million things, I janitored a million things, I reigned in a million things. And yet, the "God of the branch" position eluded me. Time after time, oddities could not be explained. Things that worked amazing in certain cases would end up being crushingly bad in others. Things that worked great would other times only be doing so in combination with other not great issues, that when churned or addressed, would end up crushingly bad for certain people places or things.

This was a tough reality, having lived, breathed, and achieved in this code with such intensity and focus for so long, No matter how far you moved, a bushwhacker was always hiding where you couldn't see him until he bushwhacked you.

It was not until this very late stage, and sadly for only a very brief time that I could start to use the pile of dirt I had organized, and janitorial experience and reduced garbage, to really push in prod in ways that started to give me answers to everything that had perplexed me, one by one.

markrmiller commented 3 years ago

Another thing I really loved to see, is how much starts to line up as you break all the bottlenecks and inefficiencies. For example, the logging output, as things get faster and more scalable, starts to line up more and more. Instead of mishmash of random logging lines, you start to see this beautiful coalescing. But what is so interesting to me, is that same behavior, due to unleashing detrimental and unpredictable weights - you can create this whole new terrible issue with a thundering heard all narrowing down on a single point like never happened before.

And in a way, that sounds depressing. Forward, backward, which way do you go, which way is worse. But my experience is, you can keep stepping on that kink in the hose, and eventually, the water runs freely.

markrmiller commented 3 years ago

So I could not always remember. Or articulate it. But there is a reason I never cared about all the things everyone else seemed to. Why don't do you this work issue by issue? Why you don't take some good things you found and file an issue? Why don't you consider an approach that will start capturing some of this work in a way that it might get captured? You realize it is likely going to be wasted work this way?

And I don't think that is a non sensible position. But the world is of full of such opposing positions that are both sensible. And the value and path that I saw and started to experience at this stage, was not capturable by those concerns or strategies - and to my mind, whether present on the surface or not, where peanuts to the sun.

markrmiller commented 3 years ago

So now, as I sit, seeking a fantastic collection creation result, I continue what started to expose these issues.

They were managed previously through a combination of help from other components and code and their behavior, but more than anything, by the same way SolrCloud came to be about as it is. I worked it there, with the limits and settings and structure that would make it work. Because if it did not - I pushed until the puzzle piece looked to fit and on to the next piece.

And damn, I made thing friggin fly that way. I have some experience with different systems. Friggin fly in comparison. And so you might say, well, what the hell then. Stop there.

But here is the thing. When you start looking into it, you see there you are leaving just too damn much on the table and in too fragile a formation. You learn that just like the state that was so much slower and hairer and complex, that you are judging success in a world where you don't even know where you stand relative to anything. And so you look, and you say, damn, I'll crush you in some benchmarks - let's go. Meh, relative comparisons are valuable in heated moments and little else. You look objectively - and you see that system is not being efficient. That it's doing a lot of "busy" silly things, that that there is no way it' near the potential it should be at, and the reasons are complex and nasty and will remain so.

And so when you realize that pushing too much starts to expose these issues? Well, you get a bulldozer and start pushing till you hear the pop. And that is what will happen here for this singular contest. Give me a short bit, I'll let you know when I hear the shoulder pop out.

markrmiller commented 3 years ago

# DA43200C-8966-403C-A266-104308928229

markrmiller commented 3 years ago

So these are the moments that get me in trouble with the wife. If you wonder why I might have mentioned sound has been an issue, not many steps left, I already knew the next one from before, and in my sleep state, I've been iterating around getting no where trying to just bring back the thought of the right class to go to, and so I also start talking to myself, just to keep an idea and to concentrate and not get completely lost in trying to recall what you were trying to do. And I wear noise canceling headphones. And so ... and then with everything else the John Nash comments that used to be flattery 20 years ago are a guaranteed to avoid dinner with me today. lol. Which people see as a reason to go sleep. But as an avid non carer, I see it as reason to double down on finishing the mission from god.

But anyway, I've got to get the vaccine tomorrow and in this state it may kill me. So the sun may win this collection creation contest. Right out of left field. Didn't even have my eye on her.

Anyway, everyone always thinks they understand the pressure I'm under and why, but they don't and everyone think they know where it comes from but they don't and so bang bang bang, collection 6 gun shooter in the air ...

markrmiller commented 3 years ago

A nap and he's back. Still counts.

But let's not get lost in the competition, the Suns late arrival be damned. The exercise is, has not been, and never was about collection creation battles. Though I love idea - so let's not get lost, because I'll go get lost with you - pack up the truck, grab the dog, leave the war, join the circus. I'll do it. So let's not.

I've already created 16000 thousand cores in whatever, under 10 seconds, I don't know, just this morning at like 2 am. More like 30 x somthing lower and fewer collections, but that just the area I was. So the feat is not the meat.

The meat is when I do the feat, for me and my pleasure, oh man, I love it, I had a great day today even though I hardly slept in two.

The meat comes when I see a bunch of obvious ways to juice even more - and those obvious moves spill all kinds of non obvious beans.

markrmiller commented 3 years ago

Let's be honest about one interesting thing that came out of this whole endeavor - there is no amount of collections or cores that can be created in any time that would impress almost anyone. That I have learned. And being in that game for myself, I have nothing but curious perspective on it. There is this dual "you don't seem to want to create any collections or cores for us" and "there really isn't a number and timeframe you could create them in that would be very relatively impressive" that is just fascinating. Much like the forward and back that has to happen for me to get this work out. I am fascinated by odd contradictions.

markrmiller commented 3 years ago

Found this beauty from 6 am. Let me beat this fake virus and I'll be back to collection to create.

https://user-images.githubusercontent.com/448788/113486442-6cf64f80-9478-11eb-8981-5899821def0f.mp4

markrmiller commented 3 years ago

I'm telling, without a sound track, you can't beat this thing.

And I'm gonna shatter it.

"You don't want no beef, boy Know I run the streets, boy Better follow me towards Downtown What you see is what you get girl Don't ever forget girl Ain't seen nothing yet until you're Downtown"

dsmiley commented 3 years ago

Fast collection creation is cool but personally, I'm most interested in test reliability & test speed. Of course I know by now that performance has been your way of achieving those things but you don't have anything to prove to me. Maybe you feel you need this contest to prove to yourself that your endeavor has been worth it?

markrmiller commented 3 years ago

https://user-images.githubusercontent.com/448788/113519850-7d7bf800-9554-11eb-9e10-dcbeffed2dbe.mp4

markrmiller commented 3 years ago

Yeah, collection creation has been a difficult sell.

I'm not sure if it is missing the forest through the trees or just too much side chaf.

The issue is, everyone thinks that collection creation is about them.

I don't have that bias. I don't care how fast you can create a collection. I don't care unless it's too slow. And human wise that can be pretty slow.

So to me, collection creation performance in a big distributed system with all kinds of things needing major support and attention ... lol. I don't give a f*@c% about collection creation. That's part of why I spent 5 minutes in a hackathon throwing up the first really terrible and hacked out collection creation api years in.

And maybe that is the problem. After being direct so many times, I'll dance around the answers now. I can't inject the answer directly, good lord I've tried. But even then, it seems I am not adding much to the answer.

To me the answer is in the fact that I don't care about collection creation, and yet this is where I invested so much to get back too.

Meh. On with the contest!

markrmiller commented 3 years ago

And I'm not calling you out about collection creation @dsmiley. It's me and the collection creation contest vs the world.

Even Ishan, focused on collection creation, not long ago said "Don't worry about collection creation performance and scale". Trying to limit the load right? He and Noble are pretty invested in collection creation, "relax on that Mark, focus on the rest, we are going to be putting plenty into that."

So you know, the people I've talked to and worked with the closest are also a little bit like, Mark has a weird collection creation fetish at this point as well :)

I'm sorry, you choose your hammer, I'll choose mine, can't spend my time selling hammers while I use em.

markrmiller commented 3 years ago

image

The contest takes a small detour. But I will record the final moments tonight.

If it was a real contest, it would surely be over. So easy to cheat and make it actually about collection creation and the numbers I can coax out instead of what it is.

This is the kind of cheating that is only cheating yourself though.

markrmiller commented 3 years ago

https://user-images.githubusercontent.com/448788/114079342-b5927c00-986f-11eb-84a0-b3619c4b8f8c.mp4