Render approach? - Githubissues

danhouldsworth commented 5 years ago

Hi - just seen you might still be working on this project and so wanted to reach out and ask some questions if I may to better understand it?

The example images are remarkably smooth. Whenever I've tried with Buddhabrot experiments my images are more 'noisy' with the sampled points cloud like rather than the fantastic orbits you are producing.

Are you plotting the orbit line, or smoothing out the points?

Many thanks!

danhouldsworth commented 5 years ago

Eg. Here showing a few million points a sample of orbits from within the set have travelled I can see the looped structure, albeit cloudlike and disconnected :

Alternatively if only looking at the higher cycle lengths it yields sharper 'features' but now not the large looped traces from your demo :

I would love to understand this more if you get the chance to explain - thanks!

rbart commented 5 years ago

Hello! I’d be more than happy to help!

Are you using your own code to generate these images? If so I’d be happy to take a look, perhaps I could point out specifically where you could make changes to achieve a similar result.

According to Wikipedia (https://en.wikipedia.org/wiki/Buddhabrot#Rendering_method), the standard Buddhabrot is basically a grayscale formed by incrementing pixels as they are visited while iterating a sample of initial points. But, only for initial points for whom the iteration does eventually escape.

What I have here doesn’t actually adhere to that definition quite exactly. It still increments pixels as they are visited during iteration, but it doesn’t care whether points escape or not. Instead, each time the Mandelbrot function is iterated on a given point, this checks to see if that iteration has fallen into a repeating cycle (code here), and if so, stops iterating that point and moves on to the next one (code here). This is what prevents the centers of the orbits from filling in, leaving only their outlines.

As for the smoothness, I did a couple of things to achieve this:

The set of initial points being iterated is several times more dense than the final raster on the larger image
The final raster itself was (if I recall correctly) 3840*2160, and then downsized by half afterward to achieve an antialiasing effect.
I use sort of a “histogram equalization” method to bring out the fine details in the grayscale (code here)
To produce the larger image I had to run my code overnight on a fairly powerful (8-core) machine

danhouldsworth commented 5 years ago

Thanks! This is really interesting.

So the approach I'm taking is admittedly somewhat unusual, but motivated by wanting to separate the compute exercise from the exploration / render exercise (and also partly to learn about pushing a modern AWS database to it's limits in terms of size, throughput, indexing etc)

I've chosen to represent the grid as signed int * signed int in size which gives -32,768 to +32,768 which maps to -3.2768 to +3.2768 which is a fairly good approximation of our Mandelbrot region of interest with out too much wastage or mapping complexity. I do this for three reasons : 1) We now have a definitive grid or raster so we can have a sense of whether it is 'complete' or not. Ie we won't be picking start points 'C' other than these. 2) Using this int mapping we get an even spread of fidelity vs floats which have more precision around 0 3) The 4bn pixel grid is big enough to have some crop & zoom ability during render, while not too big to be uncomputable / unstorable.

Next, and taking these start points, I iterate the Mandlebrot as normal 'Zn+1 = Zn^2 + C'. Keeping track of each Zn, and (taking a leaf out of your project!) tracking if Zn repeats itself. (Note : All these calcs are done with 64bit float precision, but finally converted back into Int16 for storage in the database) The 3 conditions for exiting the iteration of each sample are : 1) A repeating cycle has been detected (definitely within the M-set) 2) Mod(Zn) > 4 (definitely outside the M-set) 3) Iteration count > 100000 (tightly bound, but cannot say whether in /out the M-set) All of the points that Zn passes through are then stored in the DB alongside their respective C, the n count of that point, the final n count for that entire orbit.

When this is complete (it's still running!) we will have a definitive computed database of all points (albeit at this grid resolution, math precision and maxdwell setting)

Sounds exhausting I know(!) but I will then have the ability to pull out extracts from regions of interest in the argand plane, highlight dwell bandings of interest, or even render the important map (regions of the M-set that passed through this region of the B-set). And of course aesthetically, I then have the freedom to explore which combinations render well together without having to recompute.

All that said - while some of the test renders look promising (in that they look roughly familiar to Mset / Bset / AntiBset images) - I don't seem to be getting that phenomenal 'orbit' trace line from your project. I'm wondering if I'm going to a false precision? ie. 64bit float maths may lose precision when iterating Z=Z^2+C millions of times, in which case I'm showing artefacts rather than real fractal features? What maxdwell did you bail out? And did you chose this for reasons of precision or compute time?

In terms of the render, are you hiding the paths from low dwell counts (in mine these are often the cause of the cloudiness - albeit over time, exposure normalisation seems to hide naturally)

When you say you're showing both points that escape and don't escape - are these included within your render, as I guess they won't have fallen into a repeating cycle?

In theory - the information to render your images should lie somewhere within this dataset, and so I'm treating it as the ultimate test of whether I've made a mistake or not!!

Many thanks! (If I get the DB populated & working I'm happy to give you access so you can have a play...)

danhouldsworth commented 5 years ago

Here's a render showing all orbits that had repeated cycles (in MSet) of length between 10 and 15000 :

Clearly some interesting details / shapes, but I'm starting to notice that in mine we don't see any children-mandlebrot shapes which we do in yours, and classic MSet deep zooms, so I do think my 64bit float maths on deep dwells is losing precision

danhouldsworth commented 5 years ago

Looking outside the Mset - here are some example reads from the dataset of orbits that escaped (so Buddhabrot set) : 100-1000 iterations to escape: 1000-10000 iterations to escape: 10000-100000 iterations to escape:

(Note : the incomplete areas are becuase only ~30% of points have been calculated to date, and are being done in a raster order rather than monte carlo)

danhouldsworth commented 5 years ago

Just for kicks (and probably the most inefficient calculation of the Mset ever!) here is a pull of just the C values against how many records we have for each (ie dwell count):

So the basic calc & storage appears to yield correct features (although this doesn't validate that my orbits hold precision - only that the orbit count is in line with pictures seen on the web)

danhouldsworth commented 5 years ago

And if we just show the C points that fell into repeating cycles (within the Mset): Here I can see the children-mandlebrots :

rbart commented 5 years ago

I see, so these images are all produced from data that is already in your database? That’s a great idea. I would love to play around with the DB once you’re ready.

The fact that the Mandelbrot and Buddhabrot renders look correct would seem to suggest that what you are doing is mostly (if not completely) correct computationally.

In my experience, 64 bit floats never seemed to be a problem for me, at least not at these zoom levels. I believe I used a similar maximum number of iterations (max dwell?) although I can’t recall for sure. It might have been 1,000,000.

I didn’t do anything to hide the low dwell counts - the exposure normalization does seem to address it as you mentioned.

When you are checking for cycles, are you comparing Zn (as a float) to prior Zn’s (as floats) or, are you considering it a cycle if Zn is visiting an Int16 “bucket” that it has visited before? If the latter, then that might cause problems. I waited for the Zn’s to fall into an exact enough cycle that the floats repeated past values exactly (code here)

You did mention that the processing is still running, maybe that could be to blame?

danhouldsworth commented 5 years ago

The calc phase has now completed - albeit at 64bit math / Int16 grid. For a brief moment I got excited as pulled these images out :

Orbit length 1000-100000: Orbit length 100-100000: Orbit length 1000-100000 again, but this time over exposed:

And then realised I hadn't updated the DB table to point at, so in fact these are pulling from the the initial raster / categorisation table. Which is very strange - as suggests that at least the basic structure from the Anti-buddha can be seen without tracing the orbits at all, and just by plotting the C points that we know generate cyclical orbits. (I'll investigate this further to try and clarify)

danhouldsworth commented 5 years ago

Now pulling from the DB table for the orbit path tracing (Orbit lengths 1000 to 100000):

This is looking really promising! Still a lot of cloudiness vs the smooth loops in yours, but this has given me confidence in the approach at least.

When you are checking for cycles, are you comparing Zn (as a float) to prior Zn’s (as floats) or, are you considering it a cycle if Zn is visiting an Int16 “bucket” that it has visited before? If the latter, then that might cause problems. I waited for the Zn’s to fall into an exact enough cycle that the floats repeated past values exactly (code here)

I'm converting to Int16 to make the checks easier, but you're right - this is probably a source of numerical artefacts. I'll try and rework - my approach on this was clunky anyway. In your code (I'm not fluent with Scala so sorry if I've misunderstood) - do you only check the last 100 iterations for a cycle? https://github.com/rbart/buddhabrot/blob/e2fbb183add2c4b1be6820153dd92ff7d1247d2f/src/main/scala/main/BuddhabrotCLI.scala#L25 https://github.com/rbart/buddhabrot/blob/e2fbb183add2c4b1be6820153dd92ff7d1247d2f/src/main/scala/compute/iterate/CycleDetectingIterator.scala#L15 How does that work if it took say 10,000 iterations before repeating? Would it detect & render it?

As an aside I'm running the calc side in Javascript (NodeJS) and C (which gives me the option to improve beyond 64bit float precision), and then for rendering using 16Bit Greyscale PNG so having to compress the dynamic range of the buckets to 0 to 65536. While I've tried linear / log / sqrt I'm not sure I'm doing this particularly well or aesthetically..

danhouldsworth commented 5 years ago

Interestingly, the cloudiness reduces by including more of the lower cycle counts - which was not what I was expecting. Here is orbits of cycle length 100-10000:

(There are some gaps as I limited the DB query to 40 then 80million results)

Clearly seeing some of the looping coming through which is encouraging.

danhouldsworth commented 5 years ago

I may need to rethink the data structure also - as the index alone is over 2.5TB in size....

rbart commented 5 years ago

Wow, it looks like you're pretty much got it. I feel like I'm already seeing details that aren't present in mine.

Yes, I think you may be detecting cycles that are longer than I did. The "100" maxCycle was just a default value, when I produced the larger image, I think I used something like 10,000. But, your 100,000 may well be larger than what I was detecting.

I did something a little unusual to compress the dynamic range, although it is not particularly sophisticated either. Code is here. Basically, this is like taking all the counts from the Int16 grid as one big list, sorting them by count, then filtering to only distinct values (so each distinct count appears once in the list). Then, I just linearly map each distinct count to a point in the final greyscale range.

I'm sure once you get the compute part dialed in perfectly, your database will allow you to experiment with all kinds of different approaches - you could pull different cycle lengths separately, superimpose them, color them differently, etc. Great approach. This is making me want to take the whole thing up again too, haha.

danhouldsworth commented 5 years ago

So i've taken a bit of a detour, and set about categorising the M-Set points at a variety of math float precisions to rule out artefacts.

While it was a useful experiment in compiling a native C module to be invoked from nodeJS, it's highlighted some other issues - even in C, unoptimised mandlebrot algorithm is slow even on a compute specific high end AWS EC2 box (c5n.xlarge). Using the <quadmath.h> and <complex.h> libraries I set about calculating the M-Set for the full grid (35,000 x 35,000) using :

float (32-bit)
double (64-bit)
long double (pretty sure 80-bit precision on EC2, albeit 128-bit in size)
__float128 (true quad byte 128-bit precision floating point)

The (unoptimised) algorithm can be written in a single line so even running all precisions sequentially is concise:

do {zN_32 = zN_32  * zN_32  + z0_32;}  while (++N_32  < maxDwell && cabsf(zN_32)  < 2.0f);
do {zN_64 = zN_64  * zN_64  + z0_64;}  while (++N_64  < maxDwell && cabs(zN_64)   < 2.0 );
do {zN_80 = zN_80  * zN_80  + z0_80;}  while (++N_80  < maxDwell && cabsl(zN_80)  < 2.0l);
do {zN_128= zN_128 * zN_128 + z0_128;} while (++N_128 < maxDwell && cabsq(zN_128) < 2.0q);

Here's the dissapointment - 48hours later and it's only got through the tiny rectangular region of Z0 = -2.25 + 0.0i, to Z0 = 2.25 + 0.1i. So <5%, and this isn't even tracing orbits for cycles.

What I can see already though is that from the circa 37million Z0 samples iterated so far, there were less than 22,000 points that showed differences between 80-bit float & 64-bit float. And only 9,000 points that showed a difference between 128-bit float & 80-bit float. So an expensive way to arrive at the conclusion you already had(!) which is 64-float loses no precision for >99.9% of the area.

And of course native 64-bit floats allow for optimisation that is many orders of magnitude faster, even in nodeJS vs C (this was a surprise to me - so I guess worth learning the lesson the hard way)

danhouldsworth commented 5 years ago

Which bring me to my next (hopefully last!) question... How to detect repeating cycles in the path, without falling foul of the issues surrounding floating point comparisons (https://bitbashing.io/comparing-floats.html)

I'm unfamiliar with Scala's Hashmap method, but I'm assuming it does a direct test for float equality?

I think you rightly pointed out my current algorithm of testing after the conversion to Int16 is faulty. I 'm wondering if the maths is done at 64-bit, then casting to 32-bit float prior to testing for path cycling will be the happy medium. That still has a grid precision of 28million x 28million, which is roughly 1000x more precise than the final storage grid of 35k x 35k and (should) avoid floating point comparison issues of comparing at 64-bit.

rbart commented 5 years ago

Yes - the Hashmap was testing directly for 64 bit equality. In my experience, the floats would actually converge so that they would eventually exactly repeat values all the way to the last "decimal point". But, I never proved this. I'd be really curious to see a counter example, however, where a point that should fall into a cycle fails to do so under floating point arithmetic. (points on the bleeding edge of the M set notwithstanding)

I think the approach you describe sounds reasonable - cast / round down to 32 bit precision and then check for cycles.

Looking forward to more images!

danhouldsworth commented 5 years ago

Detour number 2...!!

I was getting disappointed in my compute speed, in that I wasn't able to accelerate by using a larger machine. After a little experimentation with worker threads and SharedArrayBuffers and I'm ready to move a little quicker(!) I also did some comparisons against the various EC2 / AWS Lambda cores vs my own 12-core Mac Pro. Surprisingly (to me) all the cores are pretty much in line with each other - so without using an efficient parallel threaded approach AWS's largest instance was no faster than my laptop. More surprising, is that Javascript (nodeJS) is inline (within 10-20%) as the equivalent routine written in C and compiled with -O3 or even -Ofast producing identical results (tested by computing the 'area' of the mandlebrot) for a given set of start conditions.

So - all that out the way, and now I'm ready to use all 128 cores of the AWS x1d.32xLarge in the knowledge I'm being reasonably efficient in utilising it!

Below early test tracing all paths to 10,000 iterations on a 10,000x10,000 grid (340 seconds on a 12-core Mac - albeit burned into a PNG rather than storing in the DB...):

danhouldsworth commented 5 years ago

Time scales linearly - so here 100,000 iterations on same 10k grid took 3350seconds on 12-core:

danhouldsworth commented 5 years ago

I'll double check these aren't artefacts :

danhouldsworth commented 5 years ago

10K grid to 1million iterations in an hour on 128 cores !

danhouldsworth commented 5 years ago

20K grid to 1million iterations - 4.5hours on 128 cores :

danhouldsworth commented 5 years ago

Woohoo! Looking so close! 5000x5000 grid to 2000 iterations (7mins on 8cores). Compute at 64-bit, casting down to 32-bit for cycle detection comparison. Remembering path up to full maxDwell=2000 length :

danhouldsworth commented 5 years ago

While I compute a like for like to be sure - it looks like the nodeJS Buffer / ArrayBuffer / TypedArray memory addressing is actually quite slow - so the overall routine takes an order of magnitude longer with cycle detection

rbart commented 5 years ago

Looks amazing!!

How are you doing cycle detection? Each time you iterate, if you're going back and comparing the new point one-at-a-time to all of the previous points in the iteration, that will be very slow.

You may be doing something like this already, but a faster way is to do something like this HashMap approach where you can quickly query the set of points you've visited during iteration - in constant time independent of the length of the cycle or iteration number. https://github.com/rbart/buddhabrot/blob/master/src/main/scala/compute/iterate/CycleDetectingIterator.scala#L15-L16

danhouldsworth commented 5 years ago

That was indeed what I was doing!! Doh! The 10,000x10,000 grid to 10,000 iterations took 2.2hrs (24x longer than earlier) I'll try and implement the HashMap. Meanwhile - the final results are looking very promising!

danhouldsworth commented 5 years ago

Same image (10k x 10k to 10k iterations) but with a pseudo hashmap (converting Z Floats into long ints and using as unique lookup key) Took 1.6hrs on 8core so is reasonably quicker than before, but should scale more efficiently for the larger iterations

danhouldsworth commented 5 years ago

Starting to see some nice detail in there that recognise from your images!

danhouldsworth commented 5 years ago

Hmmm - I'm struggling to scale with the hashmap at the higher iterations, as the hashmap itself for 1million keys at this precision grows too large for the garbage collector

rbart commented 5 years ago

The hashmap shouldn't need to contain any more than the most recent N points where N is the largest cycle you're trying to detect.

That's why I used two hashmaps, as you add each new point to the first (main) hashmap, you also need to remove the point you inserted N=maxCycle iterations ago.

So, I also created a second hashmap of (iteration number) => point to quickly look up the point that needs to be removed from the first hashmap.

On Sat, Oct 26, 2019 at 10:00 AM danhouldsworth notifications@github.com wrote:

Hmmm - I'm struggling to scale with the hashmap at the higher iterations, as the hashmap itself for 1million keys at this precision grows too large for the garbage collector

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rbart/buddhabrot/issues/2?email_source=notifications&email_token=AADOHQXW4G2GTES5U5NSPZDQQRZUBA5CNFSM4I4W3BQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKMMUI#issuecomment-546621009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADOHQXFBI6SNEPSSK5MJWTQQRZUBANCNFSM4I4W3BQQ .

rbart commented 5 years ago

Here's the scala code that's doing that: https://github.com/rbart/buddhabrot/blob/master/src/main/scala/compute/iterate/CycleDetectingIterator.scala#L46-L50

    val delTime = time - maxCycle
    if (delTime >= 0) {
      val deleted = timeToPoint.remove(delTime)
      if (!next.equals(deleted)) pointToTime.remove(deleted)
    }

This prevents either hashmap from growing without bound

danhouldsworth commented 5 years ago

It's interesting if you exclude the cardioid & period 2 bulb it starts to look more like the traditional buddhabrot. (Of course speeds it up too while I'm experimenting - this is 20k x 20k to 100,000iterations in 2.5hrs on 48cores) I'm now capturing in an UnsignedInt32 binary file, so can then separately try appropriate renders - this is 60,000x over saturated(!) in a 16bit Greyscale PNG

rbart commented 5 years ago

That looks great!!

What are you using to get 48 cores of parallelism?

I think this approach you're taking of storing the data in a database has tons of potential... you can quickly pull out each cycle length and show what it looks like individually, or color them all differently, or maybe even stack them for some sort of 3d effect.

danhouldsworth commented 5 years ago

Detour number 3 (or 4.. I've lost count!!)

I've rewritten the whole routine in C. While my earlier comparison experiments showed JS roughly inline with C performance, this didn't include large memory read/write. Despite a lot of nodeJS documentation referring to how optimised and fast Buffers are - I've found them really slow compared to C / malloc / pointers.

10k x 10k grid to 10k maxDwell - resulting in 7.5billion points cycle points - 602seconds on a single core :

Now I just need to parallelise across multiple cores*, and then pass back to nodeJS for convenient database storage.

[* so the way I've been accessing machines with multiple cores is with AWS EC2. The largest is the x1.32xlarge with 128 cores, but with memory being such a bottleneck it's not really any faster than their cheaper m5 / c5 range. And to be honest, they're not really that much faster than my 6 year old 12-core Mac....]

rbart commented 5 years ago

Wow!! That's only 10 minutes on a single core? That's fantastic.

If you did want to parallelize it even more, it wouldn't be that hard to turn this into a Spark job and run it on AWS EMR with as many cores as you want (or can afford)

danhouldsworth commented 5 years ago

I'm currently running it on a 20k x 20k grid to 10million iterations - and a nice result of the simplistic C routine using a nested fork() is that each core saves a binary grid to disk on completion (and because of the v simple way I divided up the task, some cores finish early). Even while the rest are continuing I can take a view of the sub components. So below is a 20k x 20k image under construction - midway from starting at Zy = -1.5 towards 0.0 (much slower as in the cardioid):

danhouldsworth commented 5 years ago

Even at this super high resolution & iteration count though - the orbits around 0,0 are 'muddy' compared to the distinct trace lines in your image. I'm struggling to understand this - as was hoping the high resolution would bring it out which is doesn't seem to be?

danhouldsworth commented 5 years ago

So the incomplete image above in binary grid form (stored as Uint32) has the brightest pixel 'bucket' at 9million. I take the square root to flatten the dynamic range to fit within a Grey16 PNG but otherwise it's unfiltered. I thought I understood your histogram equalization - but then can't see why that wouldn't flatten rather than bring out the fine highlights? I'll read the code again

rbart commented 2 years ago

@danhouldsworth how far did you end up getting on your project with this? Any images or code you might be able to share? I plan to come back to this myself someday.

danhouldsworth commented 2 years ago

Hey! Thanks for following up. Tbh - I’ve done little more as with Covid / lockdown my day job became all consuming over the last few years. Nice problem to have but I still would like to come back to this project. I’ve still got some approach ideas I’d like to explore to enable deeper view calculations, and also done some thinking around the motivation for the project. Keen to share & collaborate on this - I still draw inspiration from your original image! 🤩

On 3 Oct 2022, at 17:39, Rob Bart @.***> wrote:

@danhouldsworth how far did you end up getting on your project with this? Any images or code you might be able to share? I plan to come back to this myself someday.

— Reply to this email directly, view it on GitHub https://github.com/rbart/buddhabrot/issues/2#issuecomment-1265731692 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQL3Q5MGZ5PI2YQR4MO5Q3WBMD2RANCNFSM4I4W3BQQ . You are receiving this because you were mentioned. https://github.com/notifications/beacon/ABQL3Q4RLX7YIPHKMISC4VTWBMD2RA5CNFSM4I4W3BQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOJNYYQ3A.gif Message ID: @.***>

rbart / buddhabrot

Render approach? #2