urschrei / simplification

Very fast Python line simplification using either the RDP or Visvalingam-Whyatt algorithm implemented in Rust
Other
168 stars 18 forks source link

Question re Applicability to a Voltage vs Time data set #14

Closed seamusdemora closed 3 years ago

seamusdemora commented 3 years ago

I need to model a phenomenon known as contact bounce to understand its effect on the performance of the circuit in which the switch is used. I have taken measurements on contact bounce with an oscilloscope to get some representative data. My oscilloscope allows me to save the measurement data in a CSV file; each data point is a Time, Voltage pair.

The oscilloscope is a high speed digital scope - which only means that it stores many data points while capturing the waveform. I have found that these CSV files are in the range of 100 MB - far too large for my SPICE analysis software to "digest", and most of these data points are irrelevant to the result.

Consequently I need to "thin" my data set to something on the order of 1% of its current size - perhaps even more. I'm posting here because I'd like to verify these algorithms are appropriate for my data set. It seems like a good fit, but I've read that RDP "does not always preserve the property of non-self-intersection". And the mention of lines and cartographic applications for these algorithms leads me to wonder if they are well suited to thinning a function.

Any advice or recommendations are appreciated.

urschrei commented 3 years ago

I don't know anything about your problem domain, but it seems as if you're attempting to generate a graph and a naive sampling approach won't work because it might throw away extreme points which are important – you want to reduce the number of vertices while maintaining the overall "shape" of the dataset. For now, I wouldn't worry too much about self-intersection, which is a specific problem related to geometries which I don't think applies here.

The easiest thing is to try the library, and see how the results look – while RDP and VW are mostly used in cartographic applications, they are fundamentally a way of removing vertices from a line while maintaining its overall shape and the order of coordinates.

seamusdemora commented 3 years ago

I'm not much at Python - I've done a few simple things...

My dataset is a CSV-formatted file w/ data pairs as follows: 4.17E-07, 5.11E+00, where: time = 0.417 microseconds Voltage = 5.11 volts

There are approximately 10 million time/voltage pairs at a spacing of 0.2 nanoseconds.

Do you have any suggestions as to how I might proceed to put my dataset in a format suitable forsimplification?

urschrei commented 3 years ago

Can you post a sample of the CSV? Just the first 100 lines or similar.

seamusdemora commented 3 years ago

I should explain this:

-1.46096E-08,8.671E-01 -1.44096E-08,8.729E-01 -1.42096E-08,9.197E-01 -1.40096E-08,9.308E-01 -1.38096E-08,8.976E-01 -1.36096E-08,8.787E-01 -1.34096E-08,9.840E-01 -1.32096E-08,9.950E-01 -1.30096E-08,9.197E-01 -1.28096E-08,9.213E-01 -1.26096E-08,9.745E-01 -1.24096E-08,1.0156E+00 -1.22096E-08,9.571E-01 -1.20096E-08,9.719E-01 -1.18096E-08,1.0119E+00 -1.16096E-08,9.729E-01 -1.14096E-08,9.592E-01 -1.12096E-08,9.482E-01 -1.10096E-08,8.734E-01 -1.08096E-08,8.860E-01 -1.06096E-08,1.0187E+00 -1.04096E-08,1.0008E+00 -1.02096E-08,9.987E-01 -1.00096E-08,8.945E-01 -9.8096E-09,9.229E-01 -9.6096E-09,1.0134E+00 -9.4096E-09,8.803E-01 -9.2096E-09,1.0735E+00 -9.0096E-09,1.0987E+00 -8.8096E-09,9.150E-01 -8.6096E-09,9.877E-01 -8.4096E-09,1.0629E+00 -8.2096E-09,1.0540E+00 -8.0096E-09,1.0550E+00 -7.8096E-09,1.0266E+00 -7.6096E-09,1.0561E+00 -7.4096E-09,1.0893E+00 -7.2096E-09,1.1119E+00 -7.0096E-09,1.1472E+00 -6.8096E-09,1.1008E+00 -6.6096E-09,1.1419E+00 -6.4096E-09,1.2151E+00 -6.2096E-09,1.3267E+00 -6.0096E-09,1.2983E+00 -5.8096E-09,1.2956E+00 -5.6096E-09,1.3546E+00 -5.4096E-09,1.4178E+00 -5.2096E-09,1.4504E+00 -5.0096E-09,1.3878E+00 -4.8096E-09,1.4009E+00 -4.6096E-09,1.4257E+00 -4.4096E-09,1.4767E+00 -4.2096E-09,1.5999E+00 -4.0096E-09,1.6210E+00 -3.8096E-09,1.5957E+00 -3.6096E-09,1.6331E+00 -3.4096E-09,1.5499E+00 -3.2096E-09,1.6505E+00 -3.0096E-09,1.6984E+00 -2.8096E-09,1.7542E+00 -2.6096E-09,1.7063E+00 -2.4096E-09,1.6910E+00 -2.2096E-09,1.7494E+00 -2.0096E-09,1.6789E+00 -1.8096E-09,1.8063E+00 -1.6096E-09,1.7831E+00 -1.4096E-09,1.7242E+00 -1.2096E-09,1.8521E+00 -1.0096E-09,1.8279E+00 -8.096E-10,1.7995E+00 -6.096E-10,1.8953E+00 -4.096E-10,1.9074E+00 -2.096E-10,1.9353E+00 -9.6E-12,1.9979E+00 *SCOPE TRIGGERED HERE 1.904E-10,2.1501E+00 3.904E-10,2.1827E+00 5.904E-10,2.1106E+00 7.904E-10,2.2311E+00 9.904E-10,2.2569E+00 1.1904E-09,2.3006E+00 1.3904E-09,2.3270E+00 1.5904E-09,2.2948E+00 1.7904E-09,2.4338E+00 1.9904E-09,2.3359E+00 2.1904E-09,2.3349E+00 2.3904E-09,2.4944E+00 2.5904E-09,2.3770E+00 2.7904E-09,2.3980E+00 2.9904E-09,2.4912E+00 3.1904E-09,2.4380E+00 3.3904E-09,2.5202E+00 3.5904E-09,2.5812E+00 3.7904E-09,2.5128E+00 3.9904E-09,2.5212E+00 4.1904E-09,2.4933E+00 4.3904E-09,2.4396E+00 4.5904E-09,2.4507E+00 4.7904E-09,2.3933E+00 4.9904E-09,2.3485E+00 5.1904E-09,2.3496E+00 5.3904E-09,2.4107E+00

urschrei commented 3 years ago

As I say, I don't know or understand anything about your problem domain – Simplification as a library just takes in lists or 1d arrays of coordinates, which in this case looks like [(time, voltage), (time, voltage), … ] and removes any time, voltage pairs which fall below the epsilon. If you don't know how to get your CSV into a data structure like the above, Stack Overflow might be a good place to ask how to accomplish that. Good luck!

seamusdemora commented 3 years ago

Does "closed" mean you no longer wish to help? If not, I have more questions re your previous comment:

  1. Do you have a small sample data set I could look at to get the format right?
  2. If I handed simplification a 277MB "list" - would it process that?
urschrei commented 3 years ago
  1. The coords variable in the "Usage" section of the README shows how your data should look. As the example notes, it can be a list of lists, with each sublist containing coordinates (or [time, voltage] pairs in your case), or a NumPy array.
  2. The library isn't limited by input size. The only limitation is the amount of working memory your Python process can use. In general, it should be able to deal with input of that size, but it really depends on your system.