Closed kootenpv closed 1 year ago
Another fun thought would be to create a lot of input data + plots (numerically as output use a 0 or 1 for pixels), and then use machine/deep learning to learn how to output the plot 😂 (actually the tricky part here is is variable sized input)
EDIT: that issue solved by below
How to compress 1000 data pairs (X, y) into 100 data pairs (X2, y2)
ChatGPT:
There are several ways to compress a dataset of 1000 data pairs (X, y) into 100 data pairs (X2, y2). Some methods include:
So yea, likely for large data you could just sample and plot that instead!
Hi @kootenpv thanks for the suggestions & ideas! I'm afraid that as best as I can tell none of the methods you listed are guaranteed to result in the identical pixel solution.
I did some quick tests with and without lines, and got to the same result as you, the O(n) linear relation:
Sample size versus plotting time, without lines, log-log
┌────────────────────────────────────────────────────────────┐
│ ▞│
│ ▞ │
│ ▗▘ │ 10^-0.4 s
│ ▗▘ │
│ ▗▘ │
│ ▌ │
│ ▞ │
│ ▞ │ 10^-0.9 s
│ ▞ │
│ ▟ │
│ ▗▞▘ │
│ ▄▘ │
│ ▗▀ │ 10^-1.4 s
│ ▗▞▘ │
│▄▄▄▄▖ ▄▘ │
│ ▝▀▀▀▀▀▀▀▀▀▀▀▀▀▀▄▄▖ ▗▄▄▄▄▀▀▀▀ │
│ ▝▀▚▄▄▄▄▄▄▀▀▀▀▘ │
└────────────────────────────────────────────────────────────┘
1 100 10^4 10^6
Sample size versus plotting time, with lines, log-log
┌────────────────────────────────────────────────────────────┐
│ ▗▀│
│ ▞▘ │
│ ▄▀ │ 10 s
│ ▗▞ │
│ ▞▘ │
│ ▗▀ │
│ ▗▞▘ │
│ ▄▘ │
│ ▄▀ │ 1 s
│ ▗▞ │
│ ▗▞▘ │
│ ▞▘ │
│ ▄▀ │
│ ▄▀ │ 0.1 s
│ ▗▄▀ │
│▖ ▄▄ ▗▄▞▀▘ │
│▝▀▀▀▀▚▄▄▄▄▄▞▀▀▀▀ ▀▀▀▀▄▄▄▄▄▞▀▘ │
└────────────────────────────────────────────────────────────┘
1 100 10^4 10^6
There is probably lots of speedup possible even if we accept O(n) since as you correctly point out the dimensionality of the output, our pixel matrix, is quite limited. I think it would be worthwhile to do a flame graph for large numbers of pixels, there should be easy speedups possible. Also plotting with lines could be much faster, in particular I'm thinking of this line here: https://github.com/olavolav/uniplot/blob/ec4ba27a495791f4839066687786f84b82bf78fb/uniplot/pixel_matrix.py#L105
Would you perhaps like to have a go at any of these improvements yourself? Can't say when I will be able to find the time, though I am curious as you are what sort of speedups are possible.
Would you perhaps like to have a go at any of these improvements yourself? Can't say when I will be able to find the time, though I am curious as you are what sort of speedups are possible.
I'm very swamped with work but do think it's an interesting problem
Practically speaking, if I had too slow data I would probably subsample myself 😅
Also don't have time to get familiar with codebase, maybe at a later stage!
@kootenpv Sure let me know when you want to take a look! In the meantime I've added scripts/scaling_benchmark.py
to keep track of this.
Plus this conversation made me include logarithmic plotting, so thanks 😄
Closing this issue for the moment
Nice project!
So basically I have noticed that plotting has linear speed,
range(10_000_000) took 10x as long to plot as range(1_000_000)
If you are aware of the max resolution, then you could imagine you can compress the data, without resulting in a different plot
One way (probably it's possible to come up with something much better), would be to perhaps fit 1000 linear regressions on sorted data of 10_000_000 data points and use that to e.g. compress 10_000_000 rows into 1_000_000 (or less) - whatever is needed for the solution, basically the sign could be "0=stay equal, -decrease, +increase"
I was able to fit 1000 regressions in 300ms, whereas the naive plotting went from 1s to 10s, so a huge speed-up would be possible if you are able to compress the data