overeem11 / RAINLINK

Retrieval algorithm for rainfall mapping from microwave links in a cellular communication network.
16 stars 19 forks source link

projstring causes error in spTransform() #7

Closed micha-silver closed 1 year ago

micha-silver commented 3 years ago

It seems that with the new versions of proj4 (>=6) and gdal (>=3) there are more strict requirements for proj4 strings. In four files an Azimuthal Equidistant projection is defined as:

projstring <- paste("+proj=aeqd +a=6378.137 +b=6356.752 +R_A +lat_0=",YMiddle,
    " +lon_0=",XMiddle," +x_0=0 +y_0=0",sep="") 

This caused errors for me with R version 4.0.3 and these versions of GDAL and PROJ:

> library(rgdal)
rgdal: version: 1.5-18, (SVN revision 1082)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 3.1.4, released 2020/10/20
Path to GDAL shared files: /usr/share/gdal
GDAL binary built with GEOS: TRUE 
Loaded PROJ runtime: Rel. 7.2.0, November 1st, 2020, [PJ_VERSION: 720]

I edited the projstring in four files - Interpolation, ReadRainLocation, Topology and WetDryNearby..., as follows and then the various RAINLINK functions ran successfully:

projstring <- paste("+proj=aeqd +R_A +lat_0=", YMiddle,
                      " +lon_0=", XMiddle,
                      " +x_0=0 +y_0=0 +ellps=WGS84",sep="")

The "+ellips" phrase is now required, and then the "+a" and "+b" phrases are redundant (implicit in the ellipsoid)

I can suggest a PR for this small correction.

Regards, Micha

overeem11 commented 3 years ago

Dear Micha,

I also noticed that for new R versions and associated gdal/proj versions RAINLINK currently does not work. Many thanks for investigating a solution. When I replace the projstring in WetDryNearby... on a Fedora 32 Linux machine with R 4.0.3, GDAL 3.0.4 and proj 6.3.2, I get the following warning multiple times: In showSRID(uprojargs, format = "PROJ", multiline = "NO", ... : Discarded datum Unknown based on WGS84 ellipsoid in CRS definition

The wet-dry classification does not work properly then. Perhaps this is due to the older versions of gdal and proj on the machine I used for testing. But please note that gdal is already >=3 and proj4 >=6. Do you have a clue?

Regards, Aart

micha-silver commented 3 years ago
body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Aart

Sorry for such a long time to respond. 

I think I found what my problem was. Once I
    changed the CRS definition, the units became meters, but
    the Radius config setting is given in kilometers. The distance
    calculation took 15 as meters, and thus always zero nearby links
    within 15 meters of the tested link. So I was getting all NA for
    the Dry column in the WetDry data frame. 

I changed Radius in my config file to 15000, and
    now the WetDry function completes OK. Many values of Dry==1 and
    many Dry==0, and a few NA.

However I am still getting path averaged rain==0
    throughout from the RainRetrievalMinMaxRSL function. :-(
I'm using a dataset that Remko sent to me, over
    the Netherlands covering three months in 2012. More work to
    do...

Cheers, Micha

On 12/8/20 11:11 AM, overeem11 wrote:

  Dear Micha,
  I also noticed that for new R versions and associated gdal/proj
    versions RAINLINK currently does not work. Many thanks for
    investigating a solution. When I replace the projstring in
    WetDryNearby... on a Fedora 32 Linux machine with R 4.0.3, GDAL
    3.0.4 and proj 6.3.2, I get the following warning multiple
    times: In showSRID(uprojargs, format = "PROJ", multiline = "NO",
    ... :
    Discarded datum Unknown based on WGS84 ellipsoid in CRS
    definition
  The wet-dry classification does not work properly then. Perhaps
    this is due to the older versions of gdal and proj on the
    machine I used for testing. But please note that gdal is already
    >=3 and proj4 >=6. Do you have a clue?
  Regards, Aart
  —
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or unsubscribe.
  [

{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/overeem11/RAINLINK/issues/7#issuecomment-740489842", "url": "https://github.com/overeem11/RAINLINK/issues/7#issuecomment-740489842", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

-- 

Micha Silver Arava Drainage Authority cell: +972-523-665918

overeem11 commented 3 years ago

Dear Micha,

I apologize for my late response. Using Linux Fedora 32, gdal 3.0.4, proj 6.3.2 & R 4.0.3, I tested your solution by setting the radius to 15000 (meters) in the Config.R file. Note that I tested it on the 2-day dataset from the Netherlands which is part of the RAINLINK package (Load it by data(Linkdata)). Then the wet-dry classification provides values. I also investigated the cumulative rainfall sums from the maps based on this period. The distribution of the ~2 day values and for instance the mean cumulative rainfall is quite different when I compare it to running RAINLINK with Linux Fedora 31, gdal 2.3.2, proj. 5.2.0 & R 3.6.3 (for instance, the average cumulative rainfall over the period increases from 3.964 mm to 4.973 mm). And I still get those error messages for the WetDryNearby, Interpolation & Topology functions: In showSRID(uprojargs, format = "PROJ", multiline = "NO", ... : Discarded datum Unknown based on WGS84 ellipsoid in CRS definition

For function ReadRainLocation I get a similar message: In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) : Discarded datum Unknown based on WGS84 ellipsoid in CRS definition

The good news when comparing to Linux Fedora 31, gdal 2.3.2, proj. 5.2.0 & R 3.6.3: 1) Output figures of function Topology are virtually the same. 2) Number of lines with data (i.e. no NA values) is comparable for the files with path-average rainfall depths, which suggests that size of dataset is hardly or not affected by your solution (and is probably a result of different operating system or R version).

Conclusion: good news that wet-dry classification does not only provide NA values anymore, but it seems that somewhere in the process still something is going wrong. Perhaps the error messages give a clue.

As an aside: Note that I also found out that the current visualization functions in RAINLINK for plotting rainfall maps and/or link locations, seem not to work properly anymore, irrespective of the R 3.6.3 or R 4.0.3 versions mentioned above. After years of use, now the following Warning message is provided: Continuous limits supplied to discrete scale. Did you mean limits = factor(...) or scale_*_continuous()?

Since this is geared towards use for the Netherlands, and it's not that straightforward to use it in other regions, I tend to remove these visualization functions from the next version of RAINLINK.

Regards, Aart

micha-silver commented 3 years ago
body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hello Aart:

On 1/14/21 2:07 PM, overeem11 wrote:

  Dear Micha,
  I apologize for my late response. Using Linux Fedora 32, gdal
    3.0.4, proj 6.3.2 & R 4.0.3, I tested your solution by
    setting the radius to 15000 (meters) in the Config.R file. Note
    that I tested it on the 2-day dataset from the Netherlands which
    is part of the RAINLINK package (Load it by data(Linkdata)).
    Then the wet-dry classification provides values. I also 

So, some progress...

  investigated the cumulative rainfall sums from the maps based
    on this period. The distribution of the ~2 day values and for
    instance the mean cumulative rainfall is quite different when I
    compare it to running RAINLINK with Linux Fedora 31, gdal 2.3.2,
    proj. 5.2.0 & R 3.6.3 (for instance, the average cumulative
    rainfall over the period increases from 3.964 mm to 4.973 mm).
    And I still get those error messages for the WetDryNearby,
    Interpolation & Topology functions:

On this issue I wanted to ask: would you consider
    to switch to a standard Cartesian CRS, rather than cook up a
    local CRS for each network of microwave antennas? For example,
    UTM. It's easy to find the UTM zone for any lon/lat (Xmiddle,
    Ymiddle) location with a function like:

UTMZoneFromLatLon =
    function(lat, lon) {
      # Get UTM zone for latitude, longitude pair  
      lat_part = (sign(lat)+1)/2 * 100
      lon_part = floor((180+lon)/6) + 1
      epsg_code = as.character(32700 - lat_part + lon_part)
      crs_string = sp::CRS(paste0("EPSG:", epsg_code))
      return(crs_string)
    }

Admittedly, UTM in **not** equidistant like the
    CRS that you designed into the WetDry function, but it should be
    close enough. I would guess that even at the edges of a UTM
    zone, the "error" in distance measures over a few kilometers
    would probably be just a few meters. (Maybe I'll do a test, and
    report back). And using a standard CRS that is already
    incorporated into proj would probably avoid all the warnings we
    are getting.

The authority for these questions would be Roger
    Bivand. If you agree, I'll gladly post a question on r-sig-geo,
    or directly to him privately, asking what would be the best
    generic way to do distance measurements that could be applied
    anywhere on the earth.

Additionally, if you are considering a next
    version, would it make sense to switch to using sf
    objects instead of sp for the link network? If you
    choose to go this way, then you can make use of built in sf
    functions to get all antennas within the Radius value using i.e.
    st_buffer() and st_intersects(), probably making the procedure a
    bit cleaner and more transparent.

    In showSRID(uprojargs, format = "PROJ", multiline = "NO", ... :
    Discarded datum Unknown based on WGS84 ellipsoid in CRS
    definition
  For function ReadRainLocation I get a similar message:
    In showSRID(uprojargs, format = "PROJ", multiline = "NO",
    prefer_proj = prefer_proj) :
    Discarded datum Unknown based on WGS84 ellipsoid in CRS
    definition
  The good news when comparing to Linux Fedora 31, gdal 2.3.2,
    proj. 5.2.0 & R 3.6.3:

    Output figures of function Topology are virtually the same.
    Number of lines with data (i.e. no NA values) is comparable
      for the files with path-average rainfall depths, which
      suggests that size of dataset is hardly or not affected by
      your solution (and is probably a result of different operating
      system or R version).

  Conclusion: good news that wet-dry classification does not only
    provide NA values anymore, but it seems that somewhere in the
    process still something is going wrong. Perhaps the error
    messages give a clue.
  As an aside:
    Note that I also found out that the current visualization
    functions in RAINLINK for plotting rainfall maps and/or link
    locations, seem not to work properly anymore, irrespective of
    the R 3.6.3 or R 4.0.3 versions mentioned above. After years of
    use, now the following Warning message is provided:
    Continuous limits supplied to discrete scale.
    Did you mean limits = factor(...) or scale_*_continuous()?
  Since this is geared towards use for the Netherlands, and it's
    not that straightforward to use it in other regions, I tend to
    remove these visualization functions from the next version of
    RAINLINK.

I would fully agree that there is no need for
    those visualizations within RAINLINK. Today there are many R
    packages for plotting maps, as well a wide choice of external
    GIS software. 

Best regards, Micha

  Regards,
    Aart
  —
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or unsubscribe.
  [

{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/overeem11/RAINLINK/issues/7#issuecomment-760155793", "url": "https://github.com/overeem11/RAINLINK/issues/7#issuecomment-760155793", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

-- 

Micha Silver Arava Drainage Authority cell: +972-523-665918

micha-silver commented 3 years ago
body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hello Aart:

As promised, (and for my own better
    understanding) I did a rough check comparing distance
    measurements between an Azimuthal Equidistant projection and a
    UTM projection. See Attached Rmarkdown and pdf.

I chose to try in South Africa since that is a
    country with a large E-W extent. In fact it is wider than one
    UTM zone. So using UTM at the east and west extremes would
    certainly give wrong distances. When I added a "MW tower" 20km
    from Capetown (in AEQD), then converted to UTM, the distance to
    the tower was 138 m longer. 

Maybe this will add some perspective to the
    choice of CRS.
Again, it would be best to involve Roger Bivand.

Regards, Micha

On 1/14/21 2:07 PM, overeem11 wrote:

  Dear Micha,
  I apologize for my late response. Using Linux Fedora 32, gdal
    3.0.4, proj 6.3.2 & R 4.0.3, I tested your solution by
    setting the radius to 15000 (meters) in the Config.R file. Note
    that I tested it on the 2-day dataset from the Netherlands which
    is part of the RAINLINK package (Load it by data(Linkdata)).
    Then the wet-dry classification provides values. I also
    investigated the cumulative rainfall sums from the maps based on
    this period. The distribution of the ~2 day values and for
    instance the mean cumulative rainfall is quite different when I
    compare it to running RAINLINK with Linux Fedora 31, gdal 2.3.2,
    proj. 5.2.0 & R 3.6.3 (for instance, the average cumulative
    rainfall over the period increases from 3.964 mm to 4.973 mm).
    And I still get those error messages for the WetDryNearby,
    Interpolation & Topology functions:
    In showSRID(uprojargs, format = "PROJ", multiline = "NO", ... :
    Discarded datum Unknown based on WGS84 ellipsoid in CRS
    definition
  For function ReadRainLocation I get a similar message:
    In showSRID(uprojargs, format = "PROJ", multiline = "NO",
    prefer_proj = prefer_proj) :
    Discarded datum Unknown based on WGS84 ellipsoid in CRS
    definition
  The good news when comparing to Linux Fedora 31, gdal 2.3.2,
    proj. 5.2.0 & R 3.6.3:

    Output figures of function Topology are virtually the same.
    Number of lines with data (i.e. no NA values) is comparable
      for the files with path-average rainfall depths, which
      suggests that size of dataset is hardly or not affected by
      your solution (and is probably a result of different operating
      system or R version).

  Conclusion: good news that wet-dry classification does not only
    provide NA values anymore, but it seems that somewhere in the
    process still something is going wrong. Perhaps the error
    messages give a clue.
  As an aside:
    Note that I also found out that the current visualization
    functions in RAINLINK for plotting rainfall maps and/or link
    locations, seem not to work properly anymore, irrespective of
    the R 3.6.3 or R 4.0.3 versions mentioned above. After years of
    use, now the following Warning message is provided:
    Continuous limits supplied to discrete scale.
    Did you mean limits = factor(...) or scale_*_continuous()?
  Since this is geared towards use for the Netherlands, and it's
    not that straightforward to use it in other regions, I tend to
    remove these visualization functions from the next version of
    RAINLINK.
  Regards,
    Aart
  —
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or unsubscribe.
  [

{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/overeem11/RAINLINK/issues/7#issuecomment-760155793", "url": "https://github.com/overeem11/RAINLINK/issues/7#issuecomment-760155793", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

-- 

Micha Silver Arava Drainage Authority cell: +972-523-665918

overeem11 commented 3 years ago

Hello Micha,

Thanks for your suggestions.

1) Fine with me to try UTM. 2) Thanks for your offer to contact Roger Bivand. Please go ahead. 3) st_buffer() and st_intersects(): I'm afraid I don't really have time to investigate it, but if this would work faster and you have some updated code, I'm willing to replace it in the next version. 4) In my experience it can take quite some time to make code to plot high-quality background maps, but personally I nowadays use Python code which seems more flexible than the RAINLINK visualization code. Naturally, this also depends on your (GIS) experience. So yes, I'm planning to remove those visualization code from the next RAINLINK version. Then RAINLINK will focus on the rainfall retrieval and mapping, which is where it adds most value. 5) 138 m on a 20 km link is not that much and is probably negligible for our purpose of rainfall estimation, although ideally the approximation would be more accurate.

micha-silver commented 3 years ago

The RAINLINK package [1] derives rain rate from the attenuation of signal strength between microwave towers. Data typically comes from cellular network providers, and the location of the towers is typically given in longitude/latitude. Among the functions in RAINLINK, some require to get the attenuation for each microwave link (pair of towers), from those towers that are nearby, within, say 15 km. This means that the tower locations need to be transformed to an equidistant Cartesian CRS.

In the current version of RAINLINK, this was done as follows: An sp object was prepared from the tower locations, with CRS "EPSG:4326". The average latitude and longitude values were obtained as "YMiddle" and "XMiddle". Then a new, one-off azimuthal equidistant CRS was defined as:

Set projection string

projstring <- paste("+proj=aeqd +a=6378.137 +b=6356.752 +R_A +lat_0=", YMiddle, " +lon_0=", XMiddle," +x_0=0 +y_0=0",sep="")

This worked until the recent advances in proj and gdal. Now with proj

=6.x and gdal >=3.x the above definition causes an error:

Error in .spTransform_Polygon(input[[i]], to_args = to_args, from_args = from_args, : coordinate operation creation from WKT failed: generic error of unknown origin In addition: Warning messages: 1: In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) : Discarded ellps unknown in CRS definition: +proj=aeqd +a=6378.137 +b=6356.752 +R_A +lat_0=30.4 +lon_0=35.1 +x_0=0 +y_0=0 +type=crs 2: In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) : Discarded datum unknown in CRS definition

If I change the CRS definition as follows: projstring <- paste("+proj=aeqd +R_A +lat_0=", YMiddle, " +lon_0=", XMiddle, " +x_0=0 +y_0=0 +ellps=WGS84",sep="")

Then using proj 6.3.2 we get a warning: "In showSRID(uprojargs, format = "PROJ", multiline = "NO", ... : Discarded datum Unknown based on WGS84 ellipsoid in CRS definition"

whereas on a newer setup, with "Loaded PROJ runtime: Rel. 7.2.1" the above definition is accepted silently.

Our question is what is the reliable way to define a local Cartesian CRS which would work anywhere in the world, and preferably be equidistant. Should we just find the local UTM zone and use that? UTM is not equidistant, but it is ubiquitous, and if the cellular network is not too broad, I assume that distances will be fairly accurate.

Best regards, Micha

[1] https://github.com/overeem11/RAINLINK

-- Micha Silver Ben Gurion Univ Sde-Boker Remote Sensing Lab cell: +972 (52) 3665918

overeem11 commented 3 years ago

Hello Micha,

I find this a clear description of the problem and of a possible solution. Would be good to forward this to an expert.

Best regards, Aart

micha-silver commented 3 years ago

On Wed, 20 Jan 2021, Micha Silver wrote:

The RAINLINK package [1] derives rain rate from the attenuation of signal strength between microwave towers. Data typically comes from cellular network providers, and the location of the towers is typically given in longitude/latitude. Among the functions in RAINLINK, some require to get the attenuation for each microwave link (pair of towers), from those towers that are nearby, within, say 15 km. This means that the tower locations need to be transformed to an equidistant Cartesian CRS.

In the current version of RAINLINK, this was done as follows: An sp object was prepared from the tower locations, with CRS "EPSG:4326". The average latitude and longitude values were obtained as "YMiddle" and "XMiddle". Then a new, one-off azimuthal equidistant CRS was defined as:

Set projection string

projstring <- paste("+proj=aeqd +a=6378.137 +b=6356.752 +R_A +lat_0=", YMiddle, " +lon_0=", XMiddle," +x_0=0 +y_0=0",sep="")

This worked until the recent advances in proj and gdal. Now with proj

=6.x and gdal >=3.x the above definition causes an error:

Error in .spTransform_Polygon(input[[i]], to_args = to_args, from_args = from_args, : coordinate operation creation from WKT failed: generic error of unknown origin

This is not a reproducible example. Please provide a reproducible example, specifying all of the software versions, especially PROJ and GDAL. The changes in PROJ are now entering the 4th year, so I'm a bit surprised that steps were not taken before to check for problems to your workflow as Proj4 strings shift from deprecated to defunct. The package is not on CRAN, so was not caught by our reverse dependency checks, so the responsibility for making sure things work rests on the maintainer, not us.

Please also convert the example to points, (you reach .spTransform_Polygon() here, so these are not points), and try proj from the command line. I think that your +a and +b also have units problems (km not m). The +R_A argument is not given as in current:

https://proj.org/operations/projections/aeqd.html

In addition: Warning messages: 1: In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) : Discarded ellps unknown in CRS definition: +proj=aeqd +a=6378.137 +b=6356.752 +R_A +lat_0=30.4 +lon_0=35.1 +x_0=0 +y_0=0 +type=crs 2: In showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj = prefer_proj) : Discarded datum unknown in CRS definition

If I change the CRS definition as follows: projstring <- paste("+proj=aeqd +R_A +lat_0=", YMiddle, " +lon_0=", XMiddle, " +x_0=0 +y_0=0 +ellps=WGS84",sep="")

Then using proj 6.3.2 we get a warning: "In showSRID(uprojargs, format = "PROJ", multiline = "NO", ... : Discarded datum Unknown based on WGS84 ellipsoid in CRS definition"

whereas on a newer setup, with "Loaded PROJ runtime: Rel. 7.2.1" the above definition is accepted silently.

Our question is what is the reliable way to define a local Cartesian CRS which would work anywhere in the world, and preferably be equidistant.

With a reproducible example that works with proj command line tools, one could ask for views on the PROJ list itself.

Hope this helps,

Roger

Should we just find the local UTM zone and use that? UTM is not equidistant, but it is ubiquitous, and if the cellular network is not too broad, I assume that distances will be fairly accurate.

Best regards, Micha

[1] https://github.com/overeem11/RAINLINK

-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand@nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

micha-silver commented 3 years ago
body p { margin-bottom: 0cm; margin-top: 0pt; } 

Hi Aart:

Roger has replied asking for a reprex.

I prepared a small subset of CML data (about 100
    rows from the file Remko sent to me) with a script to construct
    a point layer of the Start and End points of links (antenna
    towers). Then it transforms from WGS84 to aeqd. 

I have on my old home machine R 3.6 installed
    with proj (6.3.0) and gdal (3.0.4) versions similar to what you
    are running. And I do not get any errors or warnings. Also no
    warnings on my new Debian machine, proj 7.2.0 and gdal 3.2.1.

Before I respond to Roger, Can I ask you to try
    this script and let me know if you get any warnings or errors?

Regards,
Micha

On 1/20/21 11:03 AM, overeem11 wrote:

  Hello Micha,
  I find this a clear description of the problem and of a
    possible solution. Would be good to forward this to an expert.
  Best regards,
    Aart
  —
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or unsubscribe.
  [

{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/overeem11/RAINLINK/issues/7#issuecomment-763450673", "url": "https://github.com/overeem11/RAINLINK/issues/7#issuecomment-763450673", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

-- 

Micha Silver Arava Drainage Authority cell: +972-523-665918

overeem11 commented 3 years ago

Hello Micha,

I'm willing to try your script. Please send the necessary files to me (overeem@knmi.nl) or post them here.

Best regards, Aart