inbo / etn

R package to access data from the European Tracking Network
https://inbo.github.io/etn/
MIT License
6 stars 5 forks source link

my_animals #43

Closed PieterjanVerhelst closed 4 years ago

PieterjanVerhelst commented 6 years ago

Some bugs in the get_animals() functionality:

jreubens commented 6 years ago

ability to filter on owner_organisation would be nice

stijnvanhoey commented 6 years ago

With regard to adding additional filtering options in this issue as well as some other issues (e.g. #45, #46), make sure it is worthwhile to add the functionality. For get_detections, these filters provide the ability to limit the requested data and overcome the risk of getting too large tables and large useless data transfers.

However, for functions that will return just a relative small amount of data (even without the filter), we maybe should not overload the function arguments, but just provide documentation on how to filter the data after requesting the data, e.g. just one dplyr function call away to subset the owner organisation:

my_animals <- get_animals(my_con, animal_project = "2012_leopoldkanaal") 
my_animals_inbo <- my_animals %>% filter(owner_organisation == "INBO")

the key point is the table size requested (for detections typically large, for animals much smaller).

As such, in order to not end up with tons of arguments for each function that are (1) harder to maintain and (2) create overload for the functions; let's make sure to make this a bit more efficient as in 'would be nice':

@jreubens and @PieterjanVerhelst for those open issues with a would be nice, please do a revision of these requests and make sure the requirements above are valid for each of them. Thanks!

PieterjanVerhelst commented 6 years ago

I agree; comments added to the issues. The only function where additional filters can still be helpfull, is the get_detections() function. The more data that can be reduced on the data extraction level, the better. See examples at #47 .

JolienGoossens commented 6 years ago

The network_project argument in the get_animals() function doesn't seem to filter anything. For example:

my_animals <- get_animals(my_con, network_project = "dijle")
my_animals <- get_animals(my_con, network_project = "AZ0")
my_animals <- get_animals(my_con)

All of these render the same result.

stijnvanhoey commented 6 years ago

@PieterjanVerhelst

  1. the output columns (as derived from the database view) are named: length3_type and length4_type instead of length_type3 and length_type4

  2. The output only returns the requested animal project when I'm running the code:

> my_animals <- get_animals(con, animal_project = "2012_leopoldkanaal")
> my_animals %>% distinct(projectcode)
         projectcode
1 2012_leopoldkanaal
stijnvanhoey commented 6 years ago

@JolienGoossens I do have the same issue here

When I request all the network project names (with my account), I get the following list:

 [1] "cpod-lifewatch"         "lifewatch"              "Noordzeekanaal"         "pc4c"                  
 [5] "Deveron"                "MorayFirth"             "Aberdeen"               "Conon"                 
 [9] "Skye"                   "maas"                   "AZ0"                    "Jersey_Coastal"        
[13] "MBA_Whitsand"           "MBA_Massmo"             "MBA_Wavehub"            "dijle"                 
[17] "albert"                 "bovenschelde"           "leopold"                "bpns"                  
[21] "demer"                  "saeftinghe"             "DTU-Skjern"             "Limfjord"              
[25] "SVNL-FISH-WATCH"        "BECORV"                 "Inforbiomares"          "ARAISOLA01"            
[29] "ARAISOLA02"             "ARAISOLA03"             "SARTELARAM"             "SARTELZINGARO"         
[33] "SARTELTG"               "thornton"               "ws1"                    "ws2"                   
[37] "ws3"                    "zeeschelde"             "PTN-Silver-eel-Mondego"

so, I would also assume that dijle and AZ0 are valid inputs to select from. However, when using the projectcodes returned when requesting ALL animals, I do get another set of names:

> get_animals(con) %>% distinct(projectcode)
          projectcode
1         phd_reubens
2   2015_phd_verhelst
3             homarus
4           2015_fint
5           rangetest
6   2013_albertkanaal
7     2011_rivierprik
8  2012_leopoldkanaal
9          2014_demer
10         2015_dijle

This last list is actually coming from the animals view directly, so the available project names listed there. As such, I am missing something there or there is some inconsistency on the listing of the projects as included in the animals view.

@bwydoogh and @jreubens How do I have to interpret the projectcode column in the available vliz.animals_view view? Should I join myself with other tables to properly extract the projects or can you adapt the view?

PieterjanVerhelst commented 6 years ago

@stijnvanhoey indeed, the headers which also should be retrieved are length3 length3_type length3_unit length4 length4_type length4_unit Weight Weight_unit

Indeed, now only 2012_leopoldkanaal is given in the output

stijnvanhoey commented 6 years ago
"length"                                   "length_type"                               "length_units"                             
"length2"                                 "length2_type"                             "length2_units"                             
"length3"                                 "length3_type"                             "length3_units"              
"length4"                                 "length4_type"                             "length4_units"              
"weight"                                                                                       "weight_units" 

are all in the output. Remark: length3_units. @PieterjanVerhelst, if you have other questions regard this, please open a new issue, so we focus this issue on the projectcode column in the available vliz.animals_view view, cfr. comment above

bwydoogh commented 6 years ago

@bwydoogh and @jreubens How do I have to interpret the projectcode column in the available vliz.animals_view view? Should I join myself with other tables to properly extract the projects or can you adapt the view?

A project is uniquely identified by its projectcode, a field which is returned by the animals_view. So you can join with the projects_view based on that field. Notice that the animals_view has 2 project-related fields: the projectcode and the name. I believe the 2 are mixed here.

jreubens commented 6 years ago

Keep in mind that there are two different types of projects: network projects and animal projects. Networks are normally not under moratorium (as it is interesting to see where others have put receivers. Animals are typically put under moratorium (thus the list will be much more restricted)....

stijnvanhoey commented 6 years ago

So, when I check the code of the animals view, which is the table we use to get the data when calling get_animals:

   SELECT 
    ....
    projects.projectcode
   FROM vliz.animals
     JOIN vliz.animal_tag_release ON animal_tag_release.animal_fk = animals.id_pk
     JOIN vliz.tags ON tags.id_pk = animal_tag_release.tag_fk
     JOIN vliz.projects ON animals.project_fk = projects.id AND projects.type::text = 'animal'::text

we can only select on the animal projects by filtering on the projectcode available in the view. Hence, the network_project argument in the get_animals function does not work.

Actually, I would propose to exclude the ability to filter on network project for the function get_animals and only provide an animal_project based filter.

If really required, I think (@bwydoogh tell me if I'm wrong) we should have to check for all animals involved the following: which detections are done with these animals, with which receivers have these been detected during which deployment to be able to link it with the network_projects involved (as deployments are linked to network projects). As such, we could identify those animals involved with a specific network_project and filter on them.

Is this worthwhile and is this feasible query-wise?

bwydoogh commented 6 years ago

@stijnvanhoey

Is this worthwhile and is this feasible query-wise?

Yes, you are correct (query structure). If it is worth it: I leave it up to others to decide about this. But I would like to notice that this is planned to foresee within the ETN web application (@jreubens has asked for this some time ago; wants to have an overview of what was seen on a deployment).

stijnvanhoey commented 6 years ago

So, when implemented for the ETN, I assume this will be part of a view that the webapp will use as well, I assume?

bwydoogh commented 6 years ago

@stijnvanhoey Not necessarily (we could keep the logic within the ETN web application) but we could foresee this if it is a demand within this context of the R-package.

stijnvanhoey commented 6 years ago

Well, I can't optimize the query, but with the access rights that I have, it boils down to the following query in order to include the network_projects:

SELECT DISTINCT animals.*, 
          network_project.projectcode AS network_projectcode 
   FROM vliz.animals_view animals
    JOIN vliz.animal_tag_release AS animal_tag_release ON (animals.id_pk = animal_tag_release.animal_fk)
    JOIN vliz.tags AS tags ON (animal_tag_release.tag_fk = tags.id_pk)
    JOIN vliz.detections_view AS detections ON ((tags.tag_code_space)::text = (detections.transmitter)::text)
    JOIN vliz.deployments_view AS deployments ON (detections.deployment_fk = deployments.id_pk)
    JOIN vliz.projects AS network_project ON (deployments.project_fk = network_project.id)

which takes a while ;-). So I propose to foresee the availability in the database as a view when developing the front-end to have a win-win...

stijnvanhoey commented 6 years ago

maybe, a suboption is to pass one of the binned views, making it feasible to query... e.g. with bins_week_view

  SELECT DISTINCT animals.*, 
        network_project.projectcode AS network_projectcode 
  FROM vliz.animals_view animals
    JOIN vliz.animal_tag_release AS animal_tag_release ON (animals.id_pk = animal_tag_release.animal_fk)
    JOIN vliz.tags AS tags ON (animal_tag_release.tag_fk = tags.id_pk)
    JOIN vliz.bins_week_view AS detections ON ((tags.tag_code_space)::text = (detections.transmitter)::text)
    JOIN vliz.deployments_view AS deployments ON (detections.deployment_fk = deployments.id_pk)
    JOIN vliz.projects AS network_project ON (deployments.project_fk = network_project.id)
jreubens commented 5 years ago

@bwydoogh do we still need to do anything here? Apparently the issue is not yet fixed. @JolienGoossens mentioned she gets addition data when requesting here project.

bwydoogh commented 5 years ago

@jreubens Yes, as nothing was planned for development. See https://jira.vliz.be/browse/ETN-231.

damianooldoni commented 4 years ago

@PieterjanVerhelst , @jreubens : is this issue in some way also solved now that i solved #75?

PieterjanVerhelst commented 4 years ago

Indeed, issue solved. Thanks for the nice work! I let you close the issue :-)

damianooldoni commented 4 years ago

Thanks @PieterjanVerhelst!