Fixed problems with Elements. Added elements_by_layerdat

tom-varga commented 9 years ago

Fixed Elements Class documentation. Added support for layerdat. Attempted to fix problem with str. Although not quite sure what the original intent was, so feel free to improve upon my solution. Fixed problem when .add() to an empty Elements, the layer and datapath values aren't set.

hohlraum commented 9 years ago

I don't think there's a need for a separate print method, especially since for an Elements x, x.print() gives different results from print(x). Better instead to use __str___ and __repr__. Consider the differences between the two discussed in http://stackoverflow.com/questions/1436703/difference-between-str-and-repr-in-python. In general the former is meant to be readable (it's what is returned by print when available) and the latter is meant to be unambiguous (in principle you ought to be able to recreate the object based on it's __repr__ string, although in practice this is rarely true). Note that for a collection like Elements __str__ should return the contents' __repr__.

We're using .shape a lot. Perhaps we should cache it, rather than recalculating continually.

A trivial concern: is layerdat the best term for the (layer, datatype) tuple? To me it implies special data (perhaps physical info) about the layer. Consider perhaps laydat or laytype? I don't know what's best.

tom-varga commented 9 years ago

While debugging the new Elements boolean functions, I found myself constantly wanting to see the gory details about Element and Elements' contents. The original Boundary and Elements str didn't even display the points. If you don't mind my adding points data to both str outputs, then there's no need for the print() function.

I'm really liking the idea of keeping things consistent between Boundary/Path and Elements as much as possible. If I have a list of of various types of things that I want to, say translate similarly, then it's nice not to have to worry about what type of object each one is.

I'm fine with laydat. I'll change to that.

hohlraum commented 9 years ago

Yes, totally agree that the interface for Boundary, Path and Elements should be as much as possible identical. That has always been the goal.

Skip to the bottom from here. It's mostly my stream of conciousness ramblings. I leave it for the record.

I don't agree that adding the points list to repr or str is a good idea. Purists would say that a list of the object points should go in repr because that's what's needed to reproduce the object. But plenty of other packages forgo this: e.g. matplotlib lines '<matplotlib.lines.Line2D object at 0x106982d10>'. Furthermore in our case we often do things like >>> element.rotate(45). This rotation acts in place and there's no need for a left hand assignment expression, but the rotation returns itself to allow chaining of further operations. So in the interpreter this line will print element.__repr__(). That's fine if the representation is something short. But it's a pain if you're working on a 1000 point Boundary and it prints the whole point list.

Note that shapely does the opposite of the Python convention:

>>> repr(x.shape)
'<shapely.geometry.polygon.PolygonAdapter object at 0x10659ecd0>'

>>> str(x.shape)
'POLYGON ((0 1, 0 0, 1 0, 5 5, 0 1))'

This is a better arrangement because of the chained operations problem. You only get the points list when you ask for it.

I'd say if you want to see the points just use print(element.points) which gives a nicely formatted numpy array. I suppose this brings me back to your (reasonable) desire to keep Path/Boundary the same as Elements. There is not such thing as Elements.points so you'd like to add a convenience method print() to do the job. To maintain consistency you now have to add the same method to Path and Boundary too.

To summarize: 1) KISS: no points list in repr or str. To see points for Path/Boundary use print e.points. To see them for an Elements list use for e in Elements: print e.points. Elements and Path/Boundary are not the same things, the correspondence can only be taken so far.

2a) Points list in __repr__, short __str__, str of Elements returns repr of contents -- the Python convention. This gets ugly really quickly because it prints potentially massive points lists every time an object is entered in the interpreter.

2b) Points list in __str__, short __repr__, str of Elements returns str of contents -- the shapely convention. You only get a list of points if you specifically ask for it by calling print(object).

3) Introduce a new method (not called print, that's too confusing) that handles printing in Elements. Add superfluous complementary methods to Path/Boundary.

I guess my vote's for #2b.

>>> z
Elements(layer: 1, datatype: 0, len: 2)
 Boundary(layer: 1, datatype: 0, vertices: 5)
 Boundary(layer: 1, datatype: 0, vertices: 4)

>>> print(repr(z))
Elements(layer: 1, datatype: 0, len: 2)
 Boundary(layer: 1, datatype: 0, vertices: 5)
 Boundary(layer: 1, datatype: 0, vertices: 4)

>>> print(z)
Elements(layer: 1, datatype: 0, len: 2)
 Boundary(layer: 1, datatype: 0, vertices: 5)
[[ 1.          0.        ]
 [ 0.          0.        ]
 [ 0.          1.        ]
 [ 0.12195122  1.097561  ]
 [ 1.          0.        ]]
 Boundary(layer: 1, datatype: 0, vertices: 4)
[[ 1.          1.79999995]
 [ 5.          5.        ]
 [ 1.87804878  1.097561  ]
 [ 1.          1.79999995]]

I've added the changes to branch pr26. Let me know if it does or does not fit your needs.

tom-varga commented 9 years ago

I think I'm OK with everything you said. I thought it would be a good idea to add points to repr because in my case, I generally work with rectilinear polygons with fewer than a dozen points. So, as I'm using boolean operations to make changes to sets of polygons, it's nice to see all the details of every polygon as succinctly as possible. Again, I'm fine if it's not repre that does this. Before seeing your last comment, my latest print() output for Elements looked like:

Elements([ Boundary(laydat=(1, 0), points=[[-2.5, -1.0], [-2.5, 1.0], [-1.0, 1.0], [-1.0, -1.0], [-2.5, -1.0]]), Boundary(laydat=(1, 0), points=[[1.0, 1.0], [-1.0, 1.0], [-1.0, 2.5], [1.0, 2.5], [1.0, 1.0]]), Boundary(laydat=(1, 0), points=[[1.0, 1.0], [2.5, 1.0], [2.5, -1.0], [1.0, -1.0], [1.0, 1.0]]), Boundary(laydat=(1, 0), points=[[-1.0, -1.0], [1.0, -1.0], [1.0, -2.5], [-1.0, -2.5], [-1.0, -1.0]]) ])

What's your preference as a method name instead of print()?

tom-varga commented 9 years ago

More thoughts on str vs repr ...

According to the link you gave, the goal of repr is to be unambiguous and str is to be readable

I guess you prefer the opposite and I suppose that's your prerogative. But whichever you choose to be more verbose, I would think it should print out all the details for the element. For example, a Path should also print out width and pathtype attributes. A Text element even has more useful attributes.

I see why you like 'nicely formatted numpy array' outputs for points, but it does end up looking ugly in a string format.

p = Path(laydat=(1, 0), points=[[-2.5, -1.0], [-2.5, 1.0]]) p Path(layer: 1, datatype: 0, vertices: 2) str(p) 'Path(layer: 1, datatype: 0, vertices: 2)\n[[-2.5 -1. ]\n [-2.5 1. ]]' repr(p) 'Path(layer: 1, datatype: 0, vertices: 2)' print(p) Path(layer: 1, datatype: 0, vertices: 2) [[-2.5 -1. ] [-2.5 1. ]] "p is {}".format(p) 'p is Path(layer: 1, datatype: 0, vertices: 2)\n[[-2.5 -1. ]\n [-2.5 1. ]]'

Anyway, I'll leave things as they are. But, if you'd prefer and allow another method like print() for really detailed and complete output, I'd really like to add it.

Again, can you or anybody suggest a better method name than print()? details()? describe()? desc()?

hohlraum commented 9 years ago

Hi,

Yes, for the reasons outlined above, I think (like shapely) that inverting the conventional roles of repr and str is the right for us.

Adding more detail to the repr so that it includes width, pathtype etc is fine by me. Just keep it compact, I'd like the repr string to be only one line. That may mean that the text field of Text objects must be abridged.

The str output format you propose above looks good.

I don't see the need for another "print"-like method. What other more "complete output" is available beyond what the new str gives? Is it only the formatting you object to?

Note for your last example the subtle difference between how the interpreter displays >>> "p is {}".format(p) and >>> print "p is {}".format(p). If you want pretty output use the latter.

Please keep in mind that I (and others) work with geometries having hundreds of vertices with float values with many decimal places. the consequences are that: a) There's very little value in seeing the points list since I can't interpret the geometry from a list of numbers -- using show is much more valuable. b) the points list takes up enormous amounts of screen space, so I definitely do not want points in repr since that will clutter the interpreter output.

tom-varga commented 9 years ago

OK, let's put this discussion to bed and leave it as you wish. In my industry, the average gds polygon contains very few points. So, I'll just write some wrappers to display elements in a way that best fits my needs.

hohlraum commented 9 years ago

I've refined the str output to follow the format you showed above. I also added more output info for Path and Text objects.

Merged in ba616b7b31.

hohlraum / gdsCAD

Fixed problems with Elements. Added elements_by_layerdat #26