judgej / Proj4

Experiments in a Proj4 reboot
MIT License
3 stars 0 forks source link

Definition of a point. #10

Open SamMousa opened 8 years ago

SamMousa commented 8 years ago

I think you mentioned this is something to be discussed.

What is the minimum requirement for defining a point?

There are obviously many ways to represent a point, I think the library needs to have a basic representation. I think most stuff uses WGS84 for that.

We should always be able to express a point in any coordinate / projection system as a Point in our basic representation.

So then a Point has coordinates in multiple (if not all) projections. A Point represents a place, a place can have many different coordinates in many different projections.

Should we separate Point and Coordinate?

A projection would then just be used to get a different set of coordinates, a coordinate is always connected to a projection, but a Point is always the same. If you project a Point then the Point does not change, its coordinates do.

judgej commented 8 years ago

Okay, I have three types of point identified:

A projected point is usually meaningless without that additional metadata that is in the projection config. For example, there are 60 UTM zones, and both a north and south hemisphere. For any single x,y UTM point, there are 120 places on earth that it could be pointing to - knowing the hemisphere takes it down to 60, then the zone takes it down to exactly one.

Now, if converting say a GB Ordnance Survey coordinate to a UTM point, for that to make sense, we need to calculate both the x.y UTM point that represents, AND the zone and hemisphere - one flag planted in London will convert to these four pieces of information to define the UTM point. However, the zone and hemisphere is in the projection, while the x,y is in the point, in Proj4. That always felt wrong. I don't know whether there is some fundamental reason for this, or whether it just made coding Proj4 back in 1978 just that little bit easier and it kind of stuck.

Does that make sense? I've never seen this particular point raised anywhere, so I think perhaps I am just going mad...?

judgej commented 8 years ago

There are also much more complicated points that are not based on a simple ellipsoid, but also include hundreds or thousands of local mappings, like the earth is made of a series of squares, each with their own defined offsets from an approximate ellipsoid. I'm not touching that with a barge pole, but I guess room needs to be left to slip it in if someone does. I guess that would be a variant of a projection. All the projections we are dealing with so far have a mathematical formula to map between the WGS84 geodetic point (the forward and reverse methods). The more complex projections would not use a simple mathematical formula, but would pull in lookup data and have huge tables of mappings behind them.

SamMousa commented 8 years ago

Well a point is just a place right? No matter how I reference it, it is somewhere, not something. For example if I say "the center of the earth" (how that is defined is another discussion) you know what I mean. In some coordinate systems I might give it the coordinates (0,0,0).

Basically coordinates have a coordinate system just like numbers have a base. But a Point and a Projected Point are redundant in my opinion; a Point represents a place and since there can be infinite projections there can be infinite number of coordinates for that place.

So maybe we should distinguish a Point from its coordinates?

judgej commented 8 years ago

I think the +nodefs option is a way of telling Proj4 NOT to use these lookup tables.

SamMousa commented 8 years ago

Yes, probably because in most cases it is okay to approximate these table values.

judgej commented 8 years ago

By treating all points as the same, Proj4 has got to where it is now - x and y could mean anything - degrees, metres, zones, etc. It means you can do totally invalid conversions on points, because the point has no metadata to indicate what it actually represents. This is one thing I am trying to solve. In theory every projected point would be a class of their own, but that may be a step too far.

SamMousa commented 8 years ago

Take for example the current PointInterface, it has functions: toGeocentric(). This suggests that a point can be changed into a set of geocentric coordinates. But instead a point can be referenced using a specific coordinate system which can either be geodetic or geocentric, but there could be other properties as well, all these properties combined define a projection (correct me if I'm wrong all this stuff is quite new for me).

So then the logical "API" would be:

$point = new Point($c1, $c2, $c3);

Where $c1, $c2, $c3, are coordinates in our reference / default projection (this is a library default nothing less nothing more, it could use lat / long or whatever makes it easiest for most users).

Then a projection can get coordinates for a point like this:

$projection = new UpsideDownProjection();
$coordinates = $projection->getCoordinates($point);

Where coordinates is an unordered list of key value pair depending on the projection, optionally implemented as a value object (since there are only a few types of coordinates).

SamMousa commented 8 years ago

As a developer you should never do anything with X or Y if you don't know what they mean. There would be several ways to create a point, we could even remove the default constructor to reduce any confusion:

$projection = new SomeProjection();
$point = $projection->createPoint($x, $y); // Where $x and $y make sense in the context of the projection.

The projection object can then use the given parameters and convert them the base system and there only needs to be 1 type of point.

A point since it is a place, does not need to be accompanied with its projection; only coordinates need that. I'm not suggesting to treat all coordinates the same; I'm suggesting we define each point as its coordinates in 1 projection. If you want to get coordinates for another projection all you need to do is get an instance of that other projection and request the coordinates by passing in the point.

SamMousa commented 8 years ago

Added advantage of that approach is that you can even force the proper parameter usage since those ->createPoint() functions don't all have to be the same.

judgej commented 8 years ago

Each conversion has rounding errors, so generally you want to do conversions as little as possible. This means keeping a point in its source form, or as close as you can, until you know what you are going to be doing with it. This means that I don't believe there is any "base format" for a point that would not end up introducing inaccuracies that can be avoided.

SamMousa commented 8 years ago

That's interesting; let me think on that! On Mar 29, 2016 5:46 PM, "Jason Judge" notifications@github.com wrote:

Each conversion has rounding errors, so generally you want to do conversions as little as possible. This means keeping a point in its source form, or as close as you can, until you know what you are going to be doing with it. This means that I don't believe there is any "base format" for a point that would not end up introducing inaccuracies that can be avoided.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/judgej/Proj4/issues/10#issuecomment-202968321

judgej commented 8 years ago

The diagram I have will make it clearer, showing what conversions are available, and what are not, which involve little rounding errors and which involve a lot.

judgej commented 8 years ago

Here is my sketch:

2016-03-30 00 46 53

So what it shows is:

judgej commented 8 years ago

Now, there is something I've not got quite right on that diagram, or maybe is just not clear. A projected point can have a datum and ellipsoid of its own. For example, the OSGB national grid uses the Airy ellipsoid (not WGS84). When doing the forword and inverse shifts to a geodetic datum, I have a feeling that stays as the Airy ellipsoid, so would need shifting to WGS84 if you wanted to use it on, say, a Google map. I'm not sure if the projection is what has knowledge about the protected datum and knows how to do the shift - I'm going to have to revisit this diagram again (draw it last year so a bit rusty).

The main thing to know here is that shifting from any one point to another (each of any coordinate type or projection) will involve shifting via a small set of WGS84 stages where standard formulas can be applied. Which of the conversions and shifts are needed, will depend entirely on what the starting and ending points are. In the end, this project is all about the conversions - shove a point in one end and get another point (at the same location) out the other end. A worked example would be good.

judgej commented 8 years ago

Here is the spacial reference page for OSGB: http://spatialreference.org/ref/epsg/osgb-1936-british-national-grid/ Note there are many formats including Proj4, OGC WKT - they all represent roughly the same information though.

The OSGB point for Ben Nevis (the highest peak in Scotland) is NN166712 in the OSGB National Grid projection.

This will convert to a geodetic coordinate (lat/long) using the OSGB datum and ellipsoid. So we have lat/long, but it represents a different point than Google would place it on a map.

So a OSGB to WGS84 conversion is needed, which involves a datum shift and ellipsoid conversion. That happens in part by converting to geocentric coordinates to do the datum shift, then back to geodetic coordinates.

Now, if you are working out distances between two OSGB points, then all that conversion is totally unnecessary - there are formulas for calculating that kind of thing directly in the projected space.

Each projection is designed to make something easy. For some marine projections, it means you can draw a circle on a map and it shows all points equidistant from the centre. For some projections you can measure the distance in a straight line between two points and it will be an exact distance around the curvature of the earth. Other projections make calculating areas easy. They all have their purposes, many historic from when we did not all have computers and sat-navs, but all very much relevant in many industries still today.

SamMousa commented 8 years ago

Can you explain what the term datum shift means? Is it just applying a transformation matrix? On Mar 30, 2016 2:44 AM, "Jason Judge" notifications@github.com wrote:

Here is the spacial reference page for OSGB: http://spatialreference.org/ref/epsg/osgb-1936-british-national-grid/ Note there are many formats including Proj4, OGC WKT - they all represent roughly the same information though.

The OSGB point for Ben Nevis (the highest peak in Scotland) is NN166712 in the OSGB National Grid projection.

This will convert to a geodetic coordinate (lat/long) using the OSGB datum and ellipsoid. So we have lat/long, but it represents a different point than Google would place it on a map.

So a OSGB to WGS84 conversion is needed, which involves a datum shift and ellipsoid conversion. That happens in part by converting to geocentric coordinates to do the datum shift, then back to geodetic coordinates.

Now, if you are working out distances between two OSGB points, then all that conversion is totally unnecessary - there are formulas for calculating that kind of thing directly in the projected space.

Each projection is designed to make something easy. For some marine projections, it means you can draw a straight line on a map and it exactly follows a curve on the earth. For some projections you can measure the distance in a straight line between two points and it will be an exact distance around the curvature of the earth. Other projections make calculating areas easy. They all have their purposes, many historic from when we did not all have computers and sat-navs, but all very much relevant in many industries still today.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/judgej/Proj4/issues/10#issuecomment-203171918

SamMousa commented 8 years ago

According to wikipedia, a geodetic datum defines the position using "just" a transformation matrix to some other coordinate system. If that's true then why do you always need to go via geocentric?

judgej commented 8 years ago

The ellipsoid approximates the surface of the earth. There are many ellipsoids used by different people that approximate different relevant regions of the earth better for them. Our OSGB Airy ellipsoid covers the shape of the earth around the UK nicely. When I get a lat/lon for a postcode from Ordnance Survey data, that lat/lon is NOT what can be plugged into Google Maps - it must be shifted first, otherwise it will point to the wrong place. Where I am it can be 500 metres out - 1/4 mile. The WGS84 ellipsoid is a compromise that it not the best anywhere, but is good enough for everyone to use as a reference (hence GPS and Google Maps use it).

The datum defines where the centre of the ellipsoid is (in relation to the WGS84 ellipsoid) and how that ellipsoid is positioned (rotations in three dimensions - imagine holding two baseballs and rotating one of them in some random direction - you can measure the angle that ball has rotated in each of the three dimensions in relation to the other).

Each geographic reference system (using ellipsoids) has a datum. To switch from one system to another you need to shift the point according to the difference in datums. So going from OSGB to WGS84 that point may shift north 101m, west 23m and up in elevation by 14.5m - it is the same point on the earth, but represented by different values.

You can only reasonably do this shift for geocentric (x,y,z) coordinates, as it is relatively simple trigonometry with relatively small rounding errors. So a geodetic (lat,lon,height) point is converted to a geocentric point to do this shift, then transformed back again. The resulting point will have different lat/lon/height, and also a different datum, and possibly a different ellipsoid.

judgej commented 8 years ago

This is why lat/lon values alone mean nothing without knowing what datum they refer to. We can often assume it is WGS84, but never rely on that assumption. Lat/lon always has the context of a datum, even if it is not explicitly stated.

SamMousa commented 8 years ago

Yes I see, what I dislike is the current difference between WGS84 and the other datums; they are essentially all of the same type and it shouldn't matter which I use as a default..

If I define a new datum B that is the same as some type A only (for example) translated on the X axis by +10, then I should be able to just supply a transformation to type A and it should work.

Then instead of require each datum to provide us with a translation to WGS84 we can just require that it provides a conversion that brings us "closer" to WGS84 and as long as are able to find a path to WGS84 we will be able to do conversions. If we do this properly then we could even automatically find the shortest paths between 2 datums and use that for conversion (and we could even create metrics about the accuracy of other paths)

judgej commented 8 years ago

Just on this:

A point since it is a place, does not need to be accompanied with its projection; only coordinates need that.

That's a problem I have with implementations I have seen so far. A UTM point consists of four peices of information: x, y, zone, hemisphere. At the moment, points define the x and y, while the projection defines the zone and hemisphere. So a UTM point without its projection parameters is meaningless (more accurately, ambiguous).

Similarly, a geodetic (lan/lon) point can be converted to a UTM projection point BUT that conversion also needs to set additional projection parameters (zone and hemisphere) to be an unambiguous point.

So to my mind, a projected point generally has an x/y (or north/south) pair, but also has additional arbitrary properties (depending on what type of projection it is) that fully define that point. I feel it is weird having those additional parameters stuck in the projection, UNLESS the projection is injected into the point and can be modified by the point when doing conversions. It feels to me that Proj4 simply has this approach wrong, and the way it is used probably works around this failing, but isn't very OO.

SamMousa commented 8 years ago

Yes that is become more clear to me during development. Note that I still don't see a difference between a point and a projected point; both are projected...

Currently my Coordinate class is immutable. There is no conversion implemented inside it. A geodetic point can derive X, Y and Z, and A geocentric point can derive Lat, Lon and Height; both of these derivations use the Ellipsoid (currently a datum might be passed in that has a getellipsoid (since for future conversions we do need the full datum).

This allows a Coordinate to be used no matter what type of coordinate it is; for actual conversions between projections you of course need to know both the source and target datum.

judgej commented 8 years ago

A "Point" has a very specific meaning in GIS systems. It is a point on a flat map. That is what a projected point is - a point in 3D space projected onto a flat map. So I guess a point is always a projected point.