Closed seamusdemora closed 3 years ago
I don't know anything about your problem domain, but it seems as if you're attempting to generate a graph and a naive sampling approach won't work because it might throw away extreme points which are important – you want to reduce the number of vertices while maintaining the overall "shape" of the dataset. For now, I wouldn't worry too much about self-intersection, which is a specific problem related to geometries which I don't think applies here.
The easiest thing is to try the library, and see how the results look – while RDP and VW are mostly used in cartographic applications, they are fundamentally a way of removing vertices from a line while maintaining its overall shape and the order of coordinates.
I'm not much at Python - I've done a few simple things...
My dataset is a CSV-formatted file w/ data pairs as follows: 4.17E-07, 5.11E+00
, where:
time = 0.417 microseconds
Voltage = 5.11 volts
There are approximately 10 million time/voltage pairs at a spacing of 0.2 nanoseconds.
Do you have any suggestions as to how I might proceed to put my dataset in a format suitable forsimplification
?
Can you post a sample of the CSV? Just the first 100 lines or similar.
I should explain this:
-9.6E-12,1.9979E+00
. -1.46096E-08,8.671E-01
-1.44096E-08,8.729E-01
-1.42096E-08,9.197E-01
-1.40096E-08,9.308E-01
-1.38096E-08,8.976E-01
-1.36096E-08,8.787E-01
-1.34096E-08,9.840E-01
-1.32096E-08,9.950E-01
-1.30096E-08,9.197E-01
-1.28096E-08,9.213E-01
-1.26096E-08,9.745E-01
-1.24096E-08,1.0156E+00
-1.22096E-08,9.571E-01
-1.20096E-08,9.719E-01
-1.18096E-08,1.0119E+00
-1.16096E-08,9.729E-01
-1.14096E-08,9.592E-01
-1.12096E-08,9.482E-01
-1.10096E-08,8.734E-01
-1.08096E-08,8.860E-01
-1.06096E-08,1.0187E+00
-1.04096E-08,1.0008E+00
-1.02096E-08,9.987E-01
-1.00096E-08,8.945E-01
-9.8096E-09,9.229E-01
-9.6096E-09,1.0134E+00
-9.4096E-09,8.803E-01
-9.2096E-09,1.0735E+00
-9.0096E-09,1.0987E+00
-8.8096E-09,9.150E-01
-8.6096E-09,9.877E-01
-8.4096E-09,1.0629E+00
-8.2096E-09,1.0540E+00
-8.0096E-09,1.0550E+00
-7.8096E-09,1.0266E+00
-7.6096E-09,1.0561E+00
-7.4096E-09,1.0893E+00
-7.2096E-09,1.1119E+00
-7.0096E-09,1.1472E+00
-6.8096E-09,1.1008E+00
-6.6096E-09,1.1419E+00
-6.4096E-09,1.2151E+00
-6.2096E-09,1.3267E+00
-6.0096E-09,1.2983E+00
-5.8096E-09,1.2956E+00
-5.6096E-09,1.3546E+00
-5.4096E-09,1.4178E+00
-5.2096E-09,1.4504E+00
-5.0096E-09,1.3878E+00
-4.8096E-09,1.4009E+00
-4.6096E-09,1.4257E+00
-4.4096E-09,1.4767E+00
-4.2096E-09,1.5999E+00
-4.0096E-09,1.6210E+00
-3.8096E-09,1.5957E+00
-3.6096E-09,1.6331E+00
-3.4096E-09,1.5499E+00
-3.2096E-09,1.6505E+00
-3.0096E-09,1.6984E+00
-2.8096E-09,1.7542E+00
-2.6096E-09,1.7063E+00
-2.4096E-09,1.6910E+00
-2.2096E-09,1.7494E+00
-2.0096E-09,1.6789E+00
-1.8096E-09,1.8063E+00
-1.6096E-09,1.7831E+00
-1.4096E-09,1.7242E+00
-1.2096E-09,1.8521E+00
-1.0096E-09,1.8279E+00
-8.096E-10,1.7995E+00
-6.096E-10,1.8953E+00
-4.096E-10,1.9074E+00
-2.096E-10,1.9353E+00
-9.6E-12,1.9979E+00 *SCOPE TRIGGERED HERE
1.904E-10,2.1501E+00
3.904E-10,2.1827E+00
5.904E-10,2.1106E+00
7.904E-10,2.2311E+00
9.904E-10,2.2569E+00
1.1904E-09,2.3006E+00
1.3904E-09,2.3270E+00
1.5904E-09,2.2948E+00
1.7904E-09,2.4338E+00
1.9904E-09,2.3359E+00
2.1904E-09,2.3349E+00
2.3904E-09,2.4944E+00
2.5904E-09,2.3770E+00
2.7904E-09,2.3980E+00
2.9904E-09,2.4912E+00
3.1904E-09,2.4380E+00
3.3904E-09,2.5202E+00
3.5904E-09,2.5812E+00
3.7904E-09,2.5128E+00
3.9904E-09,2.5212E+00
4.1904E-09,2.4933E+00
4.3904E-09,2.4396E+00
4.5904E-09,2.4507E+00
4.7904E-09,2.3933E+00
4.9904E-09,2.3485E+00
5.1904E-09,2.3496E+00
5.3904E-09,2.4107E+00
As I say, I don't know or understand anything about your problem domain – Simplification as a library just takes in lists or 1d arrays of coordinates, which in this case looks like [(time, voltage), (time, voltage), … ] and removes any time, voltage pairs which fall below the epsilon. If you don't know how to get your CSV into a data structure like the above, Stack Overflow might be a good place to ask how to accomplish that. Good luck!
Does "closed" mean you no longer wish to help? If not, I have more questions re your previous comment:
simplification
a 277MB "list" - would it process that? coords
variable in the "Usage" section of the README shows how your data should look. As the example notes, it can be a list of lists, with each sublist containing coordinates (or [time, voltage] pairs in your case), or a NumPy array.
I need to model a phenomenon known as contact bounce to understand its effect on the performance of the circuit in which the switch is used. I have taken measurements on contact bounce with an oscilloscope to get some representative data. My oscilloscope allows me to save the measurement data in a CSV file; each data point is a Time, Voltage pair.
The oscilloscope is a high speed digital scope - which only means that it stores many data points while capturing the waveform. I have found that these CSV files are in the range of 100 MB - far too large for my SPICE analysis software to "digest", and most of these data points are irrelevant to the result.
Consequently I need to "thin" my data set to something on the order of 1% of its current size - perhaps even more. I'm posting here because I'd like to verify these algorithms are appropriate for my data set. It seems like a good fit, but I've read that RDP "does not always preserve the property of non-self-intersection". And the mention of lines and cartographic applications for these algorithms leads me to wonder if they are well suited to thinning a function.
Any advice or recommendations are appreciated.