inbo / data-publication

🔓 Open biodiversity data publication by the INBO
https://ipt.inbo.be
Creative Commons Attribution 4.0 International
15 stars 7 forks source link

Derive positions for observations #140

Closed peterdesmet closed 1 year ago

peterdesmet commented 7 years ago

There are two ways in which observations in SAS are linked to positions:

  1. A direct one-to-one link with calculatedmeasurements: these calculated measurements are imported historical data, where original ship log files are not available. These make up the bulk of the data, but these won't grow with time.
  2. An indirect link via datetime, in which an observation falls between ship log positions (stored in measurements). This is the case for all new data, in which ship logs are uploaded and the link between observations and these logs are made at export time. This type of data will grow with time.

1. Direct link

Retrieving positions from the calculated measurements is easy:

SELECT
    obs.id,
    obs.starttime,
    obs.endtime,
    trip.id,
    trip.dateofsurvey,
    calc_meas.latitude,
    calc_meas.longitude
FROM observation AS obs
    LEFT JOIN surveyevent AS trip
        ON obs.surveyevent = trip.id
    LEFT JOIN calculatedmeasurement AS calc_meas
        ON obs.calculatedmeasurement = calc_meas.id
WHERE
    calc_meas.id IS NOT NULL

This results in 289,643 observations (on UAT).

2. Indirect link

This should get the positions for the 8,506 other observations (on UAT). The join between observation (obs) and measurement (meas) should be based on:

  1. Same trip: obs.surveyevent = meas.surveyevent
  2. Same date (but that should already be guaranteed by the trip, which never goes beyond a single date): surveyevent.dateofsurvey = meas.dateofmeasurement
  3. Same (!) starttime: obs.starttime = meas. timeofmeasurement
  4. Same (!) endtime: obs.endtime = meas.timeofmeasurement

This should result in 0, 1 or 2 measurements:

peterdesmet commented 7 years ago

Some documentation on how measurements are chosen for an observation:

peterdesmet commented 7 years ago

So, the query in pseudo code is:

for each observation
get surveyevent
get calculatedmeasurement
if
   lat/long in calculatedmeasurements NOT NULL => use those
else
   get measurement for starttime
   get measurement for endtime
   take average of measurement.lat/long => use those
Jo-Loos commented 6 years ago

query should sound like :

SELECT
    obs.id,
    obs.starttime,
    obs.endtime,
    trip.id,
    trip.dateofsurvey,
    calc_meas.latitude,
    calc_meas.longitude,
    meas.avglatitude,
    meas.avglongitude,
    Coalesce(calc_meas.latitude,meas.avglatitude) as Compl_latitude,
    Coalesce(calc_meas.longitude,meas.avglongitude) as Compl_longitude
FROM observation AS obs
    LEFT JOIN surveyevent AS trip
        ON obs.surveyevent = trip.id
    LEFT JOIN calculatedmeasurement AS calc_meas
        ON obs.calculatedmeasurement = calc_meas.id

    LEFT OUTER JOIN ( SELECT
                obs.id,
                AVG(meas.latitude) as avgLatitude,
                AVG(meas.longitude) as avgLongitude,
                Count(*) as NbrMeas
            FROM observation AS obs
                INNER JOIN surveyevent AS trip
                ON obs.surveyevent = trip.id
                INNER JOIN measurement meas ON meas.surveyevent = obs.surveyevent 
                        AND trip.dateofsurvey  = meas.dateofmeasurement 
                        AND (obs.starttime = meas.timeofmeasurement 
                        OR obs.endtime = meas.timeofmeasurement)
            WHERE 1=1
            AND obs.calculatedmeasurement IS NULL
            GROUP BY obs.id
            HAVING Count(*) <= 2
            ) meas ON meas.id = obs.id
WHERE  1=1
LIMIT 100

The added outer join selects per obs.id the average long. and lat. , while making sure that only 2 or less measurements are used (the inner join within the query assures there is at least 1 measurement ). The select statement is still a mess, all the calculated measurements and other measurements are shown, including the complemented version off them.

peterdesmet commented 1 year ago

Update on an old issue: SAS are now included in https://esas.ices.dk and published as Darwin Core to EMODNet/OBIS/GBIF via https://github.com/EMODnet/esas2obis.