rgerum / cameratransform

cameratransform is a python package which can be used to fit camera transformations and apply them to project points from the camera space to the world space and back.
http://cameratransform.readthedocs.org/
MIT License
104 stars 28 forks source link

Project Maturity & Getting Started Question #12

Closed davesargrad closed 2 years ago

davesargrad commented 3 years ago

Hi. This project seems to have great documentation. I see that it is tagged with a 1.1 release. However I don't see many issues or activity on the project.

Is the software mature? Is it something that I should pick up and experiment with, or is it better for me to look elsewhere for a library to take me from camera into geodetic coordinates?

If the author (@rgerum) still supports this, then I am very excited to use this. I simply love the effort that I see here.

Thanks, Dave

rgerum commented 3 years ago

Hi Dave, the project quite mature and is still used in research projects. There just wasn't need for any more features recently. I still support the project and if problems arise I might be able to help. I am happy to see more users for the package :-)

davesargrad commented 3 years ago

@rgerum

Thanks very much Richard. I will absolutely use it.

I appreciate the effort that has gone into this project.

davesargrad commented 3 years ago

Hi Richard @rgerum

I'm finally about to integrate. Do you, by any chance, provide the ability to install via conda? We use conda-forge as our repository for all of our python dependencies.

I did find this clickpointslink. It talks about conda installation. However it seems to suggest a specialized repository (rgerum). I've been consistent with using only conda-forge to minimize dependency nightmares. I dont see a similar reference to conda here.

davesargrad commented 3 years ago

Actually it does look like clickpoints requires both conda-forge, and rgerum.

I've found that using multiple repositories such as this, will often create headaches.

image

rgerum commented 3 years ago

well I haven't worked with conda-forge yet. As it became easier to install with pip, I don't see so much of the point of using conda if pip can also supply wheels. Therefore also for clickpoints, the newer versions are mostly just on pip.

Ah well the conda-forge was needed for some dependency of clickpoints.

davesargrad commented 3 years ago

@rgerum Ok.. not a problem at all, and thanks for the fast response. It should be easy to layer packages into our conda env, using pip.

davesargrad commented 3 years ago

Was easy:

As described here the only magic was to install pip into the conda virtual env and use that

image

Installed image

Reflected in Conda Environment

image

davesargrad commented 3 years ago

@rgerum

Now to implement my cameratransform hello-world... After that, my guess is that within just an hour or two we should have the base capability in place.

What is the minimal code that I need for a "hello-world" where I have a fixed camera position, and a coordinate (or a rectangle) in the camera field of view?

I'm assuming that the fixed camera position, would be in latitude, longitude, height above sea level. I'm also assuming that I'll provide an azimuth angle (relative to True, or Magnetic, North), and an elevation angle relative to the plane out to the horizon.

Lastly I'm assuming that I'll provide a field of view in horizontal degrees.

rgerum commented 3 years ago

I am glad that the installation went well!. For the hello world, you can have a look at the documentation: https://cameratransform.readthedocs.io/en/latest/getting_started.html the first 4 code parts show you how to setup the camera and how to transform from space to image or image to space (given an additional constraint, e.g. that the point has to be on the floor).

davesargrad commented 3 years ago

@rgerum

I'll implement that. I had seen this before. I wasnt quite sure how the following were defined

cam.elevation_m = 34.027
cam.tilt_deg = 83.307926
cam.roll_deg = -1.916219

If I am interpreting this correctly, I'd set "tilt_deg" to 90, if the camera sits on the ground (or on a building) and points out across the surface, and at the horizon. If its on the roof of a building looking slightly down at the surface then perhaps "tilt_deg" would be about 89.

In other words tilt_deg is relative to the plane stretching out to the horizon.

Am I interpreting tilt_deg properly? If so, I'll see if I can articulate a similar intepretation of the other params.

rgerum commented 3 years ago

Yes you interpreted this correctly. You can have a more detailed describtion here: https://cameratransform.readthedocs.io/en/latest/spatial.html Also in the publication there is a nice figure(see Fig1). I thought i had it also in the documentation, but apparently not. https://reader.elsevier.com/reader/sd/pii/S2352711019302018?token=CC2FE6E692212C9A9269CEE814E3DAB087091FE412C04053C9B02000E4FF3D5076478E295A3BE3383B38BDA7BAFE0DDD&originRegion=us-east-1&originCreation=20210824171747

davesargrad commented 3 years ago

Ty Sir!! Will keep you abreast of our progress using the package that you've put so much work into. :)

davesargrad commented 3 years ago

@rgerum

The hello world is working. However I'd like to see if I can set some known values. The values when calling spaceFromImage dont seem to match the values in the documentation (though I am setting the input parameters in identically the fashion described in the link you provided.

Is there a page that shows a few test scenarios that I can implement to verify that I get the appropriate values?

image

Assumptions:

rgerum commented 3 years ago

The space coordinate system is just assumed as a flat euclidean system. For sufficiently small environments it is a better assumption to use a flat plane.

davesargrad commented 3 years ago

What is sufficiently small? My application will use cameras that are roughly 10 to 500 or 1000 yards from the objects of interest.

rgerum commented 3 years ago

Well this is considerably small.

davesargrad commented 3 years ago

@rgerum I figured. :)

But what is your recommendation on a maximum distance from target, given that you are assuming a "flat earth" .. relative to the camera position?

rgerum commented 3 years ago

we have used it in our applications to distances of about 8km. But at this distance normally the uncertainty of the pixels is already bigger than the uncertainty of the assumption of a flat earth. So unless you are matching cameras at lots of different locations, there should not be any measurable errors with this assumption.

davesargrad commented 3 years ago

Got it. That makes a lot of sense.

davesargrad commented 3 years ago

Good Afternoon Richard @rgerum Could you please tell me how to set the direction that the camera is pointing.

The documentation suggests that the parameter to use is heading_deg. Yet unlike tilt_deg, roll_deg (within Getting Started), I cannot set it as follows:

cam.elevation_m = 34.027
cam.tilt_deg = 83.307926
cam.roll_deg = -1.916219
cam.heading_deg = 90 # This does not work. 

Our cameras (in our launch application) are in fixed locations, with fixed orientations. Each camera will point in a given direction (E.g. North, East, South, West, and other points on the compass). The cameras wont move or rotate.

davesargrad commented 3 years ago

Btw.. if your suggestion is that we find the latitude/longitude of landmarks visible within the image, and then to use the fitting methods. We can do that.

However I would like to know how to set the heading_deg, in the case when we know it already. In this case we would not need to specify the position of visible landmarks.

davesargrad commented 3 years ago

Ah.. I looked at this and missed it, but clearly this is how to set heading_deg

    self.cam = ct.Camera(ct.RectilinearProjection(focallength_mm=f,
                                                  sensor=self.sensor_size,
                                                  image=self.image_size),
                         ct.SpatialOrientation(elevation_m=10,
                                               tilt_deg=45, roll_deg=0, heading_deg=0))
rgerum commented 3 years ago

Hmm strange, cam.heading_deg = 90 should also work. I will look into this.

davesargrad commented 3 years ago

Hmm strange, cam.heading_deg = 90 should also work. I will look into this.

Ya.. I looked at the code thinking the same. But it doesn't seem to. If this truly is a bug then glad I can help isolate such things!

rgerum commented 3 years ago

Can you give an example where you run into problems? The following code snippet reacts on the change of the heading_deg:

import cameratransform as ct

# initialize the camera
cam = ct.Camera(ct.RectilinearProjection(focallength_mm=6.2, sensor=(6.17, 4.55), image=(3264, 2448)),
               ct.SpatialOrientation(elevation_m=10, tilt_deg=45))

print(cam.imageFromSpace([[3.17, 8, 0]]))
cam.heading_deg = 90
print(cam.imageFromSpace([[3.17, 8, 0]]))
ralphflat commented 3 years ago

@rgerum , after experimenting with fitting (see https://github.com/rgerum/cameratransform/issues/13) - both manual and using the fitting, I am now wondering if I am doing something wrong in the software. @davesargrad and I are working together with your software. I am more focused on the configuration of the software. I have decided, for the moment, to fix my focal length and sensor size to some thing reasonable and not too far off from actual.

image

Using Google maps, I have identified where the camera is located to get initial Lat / Lon and I have estimated the height of the camera (in meters), as well as other orientation parameters. I have initialized the camera as follows:

spatial_orientation = ct.SpatialOrientation(elevation_m=39.01, tilt_deg=60, roll_deg=0, heading_deg=330) self.cam = ct.Camera(ct.RectilinearProjection(focallength_mm=self.f, sensor=self.sensor_size, image_width_px=self.image_size[0], image_height_px=self.image_size[1]), spatial_orientation) self.cam.setGPSpos(53.63151, 10.00479) Again with Google maps, I have identified a fixed object on the airfield surface to get a Lat / Lon, as well as a distance from the camera. From a frame of the video, I got an X, Y of the object. I then used the software library to get a Lat / Lon for that point:

cam.spaceFromGPS(np.array([53.63246, 10.00232, 16.15]))

results: -163.98, 106.37, 161.5

Likewise converted XY point to LLA:

cam.gpsFromSpacenp.array([431, image_size[1] - 284]))

results: 53.63814, 10.00132, 39.01

As I was writing this, I found that I had a couple of problems in my code - I was using the wrong methods to convert between GPS and image data and I forgot to compensate for AMSL information. My results are much better now. So, I now feel that I am understanding how to use the libraries better. I will continue testing with other points to see if my implementation is working.

rgerum commented 3 years ago

So you are using the "space" to "gps" transfrom, which relates the flat euclidian space with the geocoordinates. I think you might want to use gpsFromImage and imageFromGPS to transform between image pixels and geocoordinates.

rgerum commented 3 years ago

I added a page to the documentation to elaborate more on the different coordinate systems: https://cameratransform.readthedocs.io/en/latest/coordinate_systems.html I hope this helps to make the coordinate systems more clear.

davesargrad commented 3 years ago

@rgerum That is simply awesome. Ty sir. Its a real pleasure to use this software, and to get your help. We will be looking to demo some of our capability in the november timeframe.. and your transformation library will be a key part of that demonstration. :)

davesargrad commented 3 years ago

@rgerum Hi Richard. What is your recommendation relative to dealing with the natural non-linearities in the conversion from camera pixel coordinate to latitude/longitude?

Specifically the case where you may have a field of view and two hills - or other fixed objects (one at a distance of a mile or more behind the other).. a pixel location of object 1 can then be a mile behind a pixel location of object 2. Yet the pixel border line between these two objects can be quite close together.

Another case of concern is a pixel location that is "in the sky" versus a pixel location that does correspond to a point on the surface of the planet?

Do landmarks help to solve this problem? Are there additional techniques that you can recommend to deal with such things?

Thanks in advance for your thoughts on such challenges.

rgerum commented 3 years ago

I am not quite sure what the point is you want to make. How to best use those locations for fitting/estimating the camera parameters? Or are you more concerned about projecting positions from the camera image back to space? Using a camera picture to measure things comes with quite some uncertainty especially if you have objects quite distant from the camera. You can try to remove some of the uncertainties by using a camera with a larger zoom, or a setup with two cameras, etc. It that is not an option, some uncertainties are just inevitable.

ralphflat commented 3 years ago

@rgerum - I have continued to work with your library and am trying to get a better understanding of how to get position estimates from the package. You may recall, I do not have a "good" number of focal length and sensor size. For the moment, I have fixed these to a reasonable value.

My expectation is that the XY would be about the same as lm_points_px. However, I get a an RMS (rmse = math.sqrt(np.square(np.subtract(lm_points_px, xy[:, :2])).mean()) ) of 273 - possibly this is due to poor focal length and sensor size?

I then tried to add landmarks to fit better values:

lm_points_space = self.cam.spaceFromGPS(lm_points_gps) self.cam.addLandmarkInformation(lm_points_px, lm_points_space, [3, 3, 5]) self.cam.perform_fit([ ct.FitParameter("elevation_m", lower=0, upper=90, value=39.01), ct.FitParameter("tilt_deg", lower=0, upper=90, value=60), ct.FitParameter("heading_deg", lower=270, upper=359, value=330) ], 10000)

This reduced the RMS to 187 (a good thing). I would have expected this to get closer to zero error. Do I need more landmark points? More iterations on the fitting? I could not see what the calculated values of the fitted parameters were - can you tell me where to find them?

rgerum commented 3 years ago

you can access the fitted parameters with cam.elevation_m, cam.tilt_deg you can also look at the trace plot cam.plotTrace() here you can see the distributions of your fitted parameters and on the right side the traces. During sampling the traces should converge to the most probable value and then just oscillate around that value with the oscillation amplitude being the uncertainty for that parameter. You can also visualize the output of your fitting with cam.plotFitInformation() where you can supply an image. This will show you how close the positions match the information that you provided.

I guess the wrong focal length can mess up your setup quite a lot as if your focal length is off by a substantial factor, all the distances will be off by the same factor.

But yes a RMS of 187 pixels sounds quite off a lot.

rgerum commented 3 years ago

I added the convenience function camera.printTraceSummary() that prints the mean and std of the fit parameters obtained by the metropolis sampling.

I also added the parameter focellength_px to simultaneously set the x and y focal length in pixel to be used in fitting, e.g.: ct.FitParameter("focallength_px", lower=2000, upper=4000, value=3063, step=100),

ralphflat commented 3 years ago

@rgerum - thanks for your answers above. I did do an iteration around various focal lengths and sensor size, in my code, and found that the combination I have, which, more or less, produces the lowest RMSE. This leads me to think something else is wrong in my setup. I used the plotFitInformation method you suggested and the points are not coming up were I expect (I assume this is plotting the XY as a circle and the altitude as the '+'). I am reevaluating this.

With respect to fitting, the results of the fitting do not seem to all obey the lower / upper constraints on the parameter value. Using the following in the fit parameters:

ct.FitParameter("elevation_m", lower=20, upper=50, value=39.01), ct.FitParameter("tilt_deg", lower=45, upper=90, value=60), ct.FitParameter("heading_deg", lower=270, upper=359, value=330)

I get the following results: elevation_m: 5.20244 (NOK), heading_deg 337.4910 (OK), tilt_deg: 114.5446 (NOK). While initial values are estimates, they are based on a combination of looking at the images and Google maps, so there are not too far off. However, two of the three computed values do not obey the lower / upper limits. The outputted plots don't show the first values of the two Not OKs being near the initial value:

image

I will try to incorporate the updates you made yesterday and see if that helps me at all. Thanks for those.

rgerum commented 3 years ago

What you could also do is look at the plotFitInformation before running the fit just with setting the values manually. This way you might be able to find rough estimates for the values quicker. The plot plots the point you defined in the image and the point which is reconstructed from the 3D view as o and x connected with a line. If the 3D point is missing, it is behind the camera and cannot be displayed.

It currently looks as though the algorithm finds a solution where maybe the camera is placed below the landmarks and looks up to the landmarks.

ralphflat commented 3 years ago

@rgerum - thanks once again for your answers. I have been trying to understand what could be going on with my implementation. I have, among other things re-read this chain. As well looking at my implementation. So, I have a serious of questions:

rgerum commented 3 years ago
ralphflat commented 3 years ago

@rgerum - Thanks for yesterday's information. It was very helpful. Following the idea you had of plotting fit information, I plotted my calculated XY, as follows:

` self.cam.addHorizonInformation( np.array([[XY pairs of horizon] ])), uncertainty=10)

    lm_points_px = np.array( [ [XY pairs of landmarks] })
    lm_points_gps = np.array( [ [ GPS points with altitude from AMSL (m)] ])
    points_space = self.cam.spaceFromGPS(lm_points_gps)
    self.cam.addLandmarkInformation(lm_points_px, points_space, [3, 3, 5])

    xy = self.cam.imageFromGPS((lm_points_gps)
    geo_pos = self.cam.gpsFromImage((lm_points_px, z=22)
    im = np.zeros((1022, 1650, 3))
    self.cam.plotFitInformation(im)
    x = np.ndarray.tolist(xy[:, 0])
    y = np.ndarray.tolist(xy[:, 1])
    plt.plot(x, y, 'rx')

`

This resulted in the following image (sorry about the black image). Blue is horizon, orange is landmarks and red is the additional XY plots. The pluses line up pretty well with the horizon and landmarks. My expectation was that the conversion of the lm_points_gps to xy would line up closely with each other. However, as you can see below, the plotted xy data is right on the GPS points (orange circles). So, is my expectation wrong? Should the xy points line up with the lm_points_px? Or is this a projection issue onto the 2D frame?

image

rgerum commented 3 years ago

It looks like your horizon is not a straight line (the horizion in the image, the blue +). Are you using a GoPro style camera or a camera with a strong fish eye lens? Then you should try to make a calibration for a lens correction before trying to use the camera.

That the red x and the orange O are on top of each other is clear as there you once go gps->space->image and gps-> image which has to result in the same point.

ralphflat commented 3 years ago

@rgerum

Thanks, this is really awesome and helps us understand some critical concepts.

For example, you are exactly right! Now that we are looking at the image, there is a distinct curvature in the horizon. It did not even dawn on us that this could be an issue.

Our first look at this particular camera was largely arbitrary. We recognized from the beginning that once we had a proper camera, we would be in a far better position with our results. Our initial look was to really understand how to use your camera software. You have now helped us over that hurdle.

Part of what we are now doing is specifying the required cameras. In fact, we hope to get one purchased and set up in an experimental fashion. We have learned that the following parameters are critical in our camera selection:

• Focal length • Sensor size • Lens distortion characteristics (avoid fisheye lens, for example)

We think we now know how to use the cameratransform software properly. We are increasingly excited to use it. We hope to have a specified camera on line in the near future. We would love your thoughts on camera specification. In a subsequent post, we will summarize our observability requirements that will impact camera selection.

Thanks very much for your diligence and responsiveness. We will continue to keep you in the loop with our progress and additional questions.

davesargrad commented 3 years ago

@rgerum

I would simply like to echo @ralphflat sentiment. You have really helped us get to a point where we have a systems engineering perspective on how to select a proper camera, and how to leverage your software.

Thanks so much for getting us to this point. We will hopefully take the next steps that Ralph has described.. in the next few weeks.

rgerum commented 3 years ago

I am happy that I could help you. And yes I would avoid the use of a fish eye lens as it makes position reconstructions increasingly more error prone, even if corrected for the fish eye distortion. It is very well possible to correct for fish eye distortion to get a clean looking image but for position reconstructions small parameter changes can still result in quite a lot of positional change especially at the border of a fish-eye image. And for the focal length, here you have to take your application into account. A high zoom objective (small angular field of view) has a better resolution for objects far away but has a smaller field of view. To get the best of both worlds we used a camera on a Pan-and-Tilt unit to collect high resolution panoramic images. But this obviously adds complexity as you have to operate the Pan and Tilt unit and stitch the images together into a panorama. CameraTransform also supports Cylindrical Projections, the most commonly used panoramic projection.

davesargrad commented 3 years ago

Hi Richard. @rgerum

Are the landmarks, object height information, and horizon information only used in the fitting process? Or are they always used in the basic conversion from image coordinates to GPS coordinates?

In the documentation they are only described in the context of fitting.

davesargrad commented 3 years ago

Hi Richard @rgerum Another question that I have is in the instantiation of the Brown lens distortion. What value should be passed to the 4th argument (projection)? Do we pass in the same projection that is used to instantiate the Camera object itself (for example the RectilinearProjection")?

    spatial_orientation = ct.SpatialOrientation(elevation_m=vso["elevation"],
                                                tilt_deg=vso["tilt_deg"],
                                                roll_deg=vso["roll_deg"],
                                                heading_deg=vso["heading_deg"])

    rectilinear_projection = ct.RectilinearProjection(focallength_mm=c['focal_length'],
                                                      sensor=(vss['width'], vss['height']),
                                                      image=(vis['width'], vis['height']))

    lens_distortion = ct.NoDistortion
    if "lens_distortion" in c:
        if "BROWN" == c["lens_distortion"] and "BROWN" in c["lens_distortion"]:
            k1 = c["lens_distortion"]["BROWN"].get("k1")
            k2 = c["lens_distortion"]["BROWN"].get("k2")
            k3 = c["lens_distortion"]["BROWN"].get("k3")

            lens_distortion = ct.BrownLensDistortion(k1, k2, k3)

    cam = ct.Camera(rectilinear_projection, spatial_orientation, lens_distortion)
rgerum commented 3 years ago

They are only used for the fitting process. The transforms just depend on the camera parameters.

-------- Original-Nachricht -------- Am 13. Sep. 2021, 08:45, David Sargrad schrieb:

Hi Richard. @.***(https://github.com/rgerum)

Are the landmarks, object height information, and horizon information only used in the fitting process? Or are they always used in the basic conversion from screen coordinates to GPS coordinates?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

davesargrad commented 3 years ago

They are only used for the fitting process. The transforms just depend on the camera parameters. -------- Original-Nachricht -------- Am 13. Sep. 2021, 08:45, David Sargrad schrieb: Hi Richard. @.***(https://github.com/rgerum) Are the landmarks, object height information, and horizon information only used in the fitting process? Or are they always used in the basic conversion from screen coordinates to GPS coordinates? — You are receiving this because you were mentioned. Reply to this email directly, [view it on GitHub](#12 (comment)), or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

I see. Ty.

davesargrad commented 3 years ago

Hi Richard @rgerum

How easy would it be to support the tangential (prism) component of the Brown model? It would seem that in cases where a lower end camera is being used that the misalignments in lens components could lead to distortions that might benefit from inclusion of the tangential.

rgerum commented 3 years ago

about the projection. The parameter is currently unused. I should just remove it.

about the tangential components. I did not implement those because like that the transform is not invertible. This means you can not transform from the space directly to the distorted image.

davesargrad commented 3 years ago

@rgerum Ty.. On both points!