peterstangl / svg2data

A Python module for reading data from a plot provided as SVG file.
GNU General Public License v2.0
21 stars 3 forks source link

Cannot find grid on the plot I'm trying to read #4

Open bedaro opened 3 years ago

bedaro commented 3 years ago

Hi,

I'm trying to extract data from this PDF containing several hundred plots. https://fortress.wa.gov/ecy/publications/documents/0803037appc.pdf

For example, I'm starting with the second plot on page 14 for Alki East Chlorophyll-a. If all I do is save this page as an SVG using Inkscape 1.0, I get this:

svg2data/svg2data.py in get_axes(lines, width, height)
    730                 cleaned_axes[i].append(axes[i][j])
    731     axes = cleaned_axes
--> 732     axes_min = np.array([axes[0][0]['min'][0],axes[1][0]['min'][1]])
    733     axes_max = np.array([axes[0][0]['max'][0],axes[1][0]['max'][1]])
    734     new_lines = []

IndexError: list index out of range

Next I tried to make the job easier by deleting everything from the page except the plot I want. Same error. Next, I used Inkscape to Resize Page to Selection (the resulting SVG is attached with the extension changed to please github) [0803037appc_p14.txt](https://github.com/peterstangl/svg2data/files/5046091/0803037appc_p14.txt This produces:

svg2data/svg2data.py in __init__(self, filename, test, debug)
    104         and debug != 'get_axes'
    105         and debug != 'connect_graphs'):
--> 106             grids = calibrate_grid(axes,phrases,width,height)
    107         elif debug == 'calibrate_grid':
    108             self.debug = {'axes':axes,

svg2data/svg2data.py in calibrate_grid(axes, phrases, width, height)
   1033                         axis_scaling = 'linear'
   1034                 else:
-> 1035                     raise Exception('no grid found!')
   1036                 grids_calibr[axis_type]['type']=axis_scaling
   1037                 grids_calibr[axis_type]['grid']=grid_calibr

Exception: no grid found!

Any ideas why it can't find the plot? Is a grid required?

peterstangl commented 3 years ago

Hi @bedaro, "grid" means in this case only that the code can find ticks and corresponding values on the x and y axes. I see two things the code might not be able to handle:

To solve these problems in the svg file, one could remove the dates on the x-axis and replace them by (unrotated) numerical values. To solve these problems in the code, the functions that look for the tick values have to be modified to also find rotated text. In addition, the strings describing the dates have to be converted to numerical values.