inbo / etn

R package to access data from the European Tracking Network
https://inbo.github.io/etn/
MIT License
6 stars 4 forks source link

Handle ghost detections (at DB & pkg) #68

Open peterdesmet opened 6 years ago

peterdesmet commented 6 years ago

Here's a quick overview of the data that will be included in the dataset we will publish. @PieterjanVerhelst @IPauwels can you have a look if this makes sense? Let me know if you need more info.

animal_project_name animals.scientific_name detections individuals stations start_date end_date
2011 Rivierprik Lampetra fluviatilis 114605 29 29 2011-12-14 2012-07-03
2012 Leopoldkanaal Anguilla anguilla 2215829 92 60 2012-07-04 2017-03-12
2014 Demer Petromyzon marinus 42 1 1 2015-05-06 2015-05-12
2014 Demer Rutilus rutilus 11030 2 9 2014-04-19 2014-06-28
2014 Demer Silurus glanis 86023 9 46 2014-04-25 2018-01-31
2014 Demer Squalius cephalus 139013 2 10 2014-04-30 2015-02-12
2015 Dijle Anguilla anguilla 41798 1 7 2015-05-01 2015-10-15
2015 Dijle Cyprinus carpio 4944 2 9 2015-04-23 2015-11-06
2015 Dijle Platichthys flesus 101488 8 28 2015-04-29 2016-04-08
2015 Dijle Rutilus rutilus 7870 4 9 2015-04-23 2015-09-14
2015 Dijle Silurus glanis 78331 11 25 2015-04-22 2017-09-16
PieterjanVerhelst commented 6 years ago

The project '2012 Leopoldkanaal' ended in January 2016 (i.e. the last receiver of that project was removed on 18/01/2016). Could you check which eel was detected till 2017? Was it 29920?

peterdesmet commented 6 years ago

@PieterjanVerhelst very few detections after January 1, 2016:

datetime transmitter deployment_station_name
2016-05-05T21:55:57Z A69-1601-29954 S07
2017-02-18T02:10:39Z A69-1601-29961 s-Wetteren
2017-02-20T14:38:25Z A69-1601-29961 s-Wetteren
2017-03-12T00:43:14Z A69-1601-29961 s-Wetteren
2017-03-12T02:31:25Z A69-1601-29961 s-Wetteren
PieterjanVerhelst commented 6 years ago

Could you give the station name for those detections?

peterdesmet commented 6 years ago

Updated (+ plus in original overview it now has start, end, not end, start)

PieterjanVerhelst commented 6 years ago

I just checked the battery end dates of those two tags and they dropped dead in February 2015. These 5 detections are likely ghost detections and can be removed from the dataset. However, this poses the urge for a quality check regarding tag detections post battery end date.

peterdesmet commented 6 years ago

Indeed. I've updated this issue title. Can you check this and remove those from the database itself? I think that's better than me removing them from my data dump.

Once done and checked, let me know if there were many: if not, I'll remove them from my dump. If yes, I'll ask for a new dump.

PieterjanVerhelst commented 6 years ago

I finally found time to get to this issue :-). I discussed this with the VLIZ team and they prefer that such data removals don't occur at the database level. @peterdesmet what do you think?

peterdesmet commented 6 years ago

Can they be flagged by the user as ghost detections, so these can be filtered upon?

PieterjanVerhelst commented 6 years ago

That should be possible. I'll check with them.

PieterjanVerhelst commented 6 years ago

It should be possible to add a column with a boolean TRUE FALSE. @jreubens @bwydoogh this can be implemented? If this would be done, we only need a rule to consider ghost detections.

bwydoogh commented 6 years ago

Can they be flagged by the user as ghost detections, so these can be filtered upon?

Yes, why not (if Jan agrees; I hope he is also watching this Github repo). What are the exact rules to set that flag to TRUE?

PieterjanVerhelst commented 6 years ago

We could consider a ghost detection when the detection timestamp > battery end date & there is no recapture date.

jreubens commented 6 years ago

It seems logic to me that this should be flagged as 'possible ghost detection'. However, it should be clear that it is the responsibility of the user/data owner to use or not use these detections. It is just a flag. Having said this, we should have clear rules what we consider as ghost detections. THe rule mentioned by Pieterjan can be a start (this is a simple example. However, there should be more rules (and some might be quite complicated).... We also need to take into account possible tag ID duplication, due to the fact that several brands use the same set of ID codes...

I suggest that we start with the implementation of the rule of Pieterjan, but we should have a brainstorm on other possible rules as well

PieterjanVerhelst commented 6 years ago

Notably, Since the exact hour of tagging and therefore battery end date are often unknown, I would add a buffer of at least one day. Or even a month. So a detection is considered a ghost detection if it occurred > 1 month after the battery end date. Detections < 1 months should be checked by the researcher if a wrong tagging time stamp was given.

stijnvanhoey commented 6 years ago

ny link with https://gitlab.oceantrack.org/GreatLakes/glatos#filtering-and-summarizing functions?

jreubens commented 6 years ago

Fyi there is also another package called Vtrack with some functionalities https://github.com/cran/VTrack

From: Stijn Van Hoey [mailto:notifications@github.com] Sent: 27 April 2018 11:45 To: inbo/etn etn@noreply.github.com Cc: Jan Reubens jan.reubens@vliz.be; Mention mention@noreply.github.com Subject: Re: [inbo/etn] Check dataset for ghost detections post battery end date (#10)

ny link with https://gitlab.oceantrack.org/GreatLakes/glatos#filtering-and-summarizing functions? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bwydoogh commented 5 years ago

@jreubens @PieterjanVerhelst @stijnvanhoey How do proceed with this topic? Will you use an R package to filter ghost detections, or do we / I add a field in the ETN DB, table detections? Notice that on a total of (currently) 40 million detection records, we have 2.921.976 detection records where detections.datetime > deployments.drop_dead_date (and 2.189.176 where detections.datetime > deployments.drop_dead_date + INTERVAL '1 month').

Other things to pay attention to:

stijnvanhoey commented 5 years ago

My proposal:

PieterjanVerhelst commented 5 years ago

I would suggest that a ghost detection is a detection of which the time < activation time & time > drop dead date. Since tags have a programmed drop dead date according to the manufacturer (Vemco), I don't understand how it is possible we have so many detections with detection time > drop dead date.

jreubens commented 5 years ago

IMOS has written a nice piece of code on QCs see 'https://github.com/aodn/aatams/tree/master/scripts/R/QC' I suggest we currently keep this on hold. We should discuss with IMOS first

The reason we have so many detections after deployment recovery is because we have receivers with a built-in tag that keep on pinging until you disconnect battery (which is not always immediately done. Anyway, we have to look at this in more detail....

PieterjanVerhelst commented 5 years ago

Is there still need for a conversation & brainstorm to implement rules related to ghost detections?

peterdesmet commented 5 years ago

@PieterjanVerhelst I don't know. But I would like you to have a look at the original question, which is now at https://github.com/inbo/etn-occurrences/issues/7 😄

peterdesmet commented 4 years ago

@PieterjanVerhelst @jreubens @IPauwels this issue has been dormant. No need to read it all, the basic question is: do you want an automatic assessment (by the database (ideally) or package) to assess which detections are likely to be ghost detections?

PieterjanVerhelst commented 4 years ago

I would say yes, to some degree. That is, all 'detections' occurring after the battery of the transmitter dropped dead, I would consider ghost detections. Other forms of uncertainty are up to the researcher to decide what to do with it (include it in analysis or not). However, I think @jreubens may have some additional thoughts on this one ;-).

IPauwels commented 4 years ago

I will leave it up to Jan as well, but had this small thought: you are never sure about the actual battery-end-date of the transmitter isn't it? So perhaps detections after the expected end date are still real detections.. Or did I understand your suggestion wrongly PJ?

PieterjanVerhelst commented 4 years ago

Transmitters have an end date and on that particular date, the transmitter runs dead.

jreubens commented 4 years ago

Yes this is needed! However to my opinion this should be tackled on DB level.

peterdesmet commented 4 years ago

@jreubens should I leave this issue open then? Or are you following this up on your side?

PietrH commented 2 weeks ago

@jreubens , @peterdesmet Can this be closed?