Closed PieterjanVerhelst closed 4 years ago
ability to filter on owner_organisation would be nice
With regard to adding additional filtering options in this issue as well as some other issues (e.g. #45, #46), make sure it is worthwhile to add the functionality. For get_detections
, these filters provide the ability to limit the requested data and overcome the risk of getting too large tables and large useless data transfers.
However, for functions that will return just a relative small amount of data (even without the filter), we maybe should not overload the function arguments, but just provide documentation on how to filter the data after requesting the data, e.g. just one dplyr
function call away to subset the owner organisation:
my_animals <- get_animals(my_con, animal_project = "2012_leopoldkanaal")
my_animals_inbo <- my_animals %>% filter(owner_organisation == "INBO")
the key point is the table size requested (for detections
typically large, for animals much smaller).
As such, in order to not end up with tons of arguments for each function that are (1) harder to maintain and (2) create overload for the functions; let's make sure to make this a bit more efficient as in 'would be nice':
dplyr
after requesting the data. would be nice
is not enough ;-))@jreubens and @PieterjanVerhelst for those open issues with a would be nice, please do a revision of these requests and make sure the requirements above are valid for each of them. Thanks!
I agree; comments added to the issues. The only function where additional filters can still be helpfull, is the get_detections()
function. The more data that can be reduced on the data extraction level, the better. See examples at #47 .
The network_project
argument in the get_animals()
function doesn't seem to filter anything.
For example:
my_animals <- get_animals(my_con, network_project = "dijle")
my_animals <- get_animals(my_con, network_project = "AZ0")
my_animals <- get_animals(my_con)
All of these render the same result.
@PieterjanVerhelst
the output columns (as derived from the database view) are named: length3_type
and length4_type
instead of length_type3
and length_type4
The output only returns the requested animal project when I'm running the code:
> my_animals <- get_animals(con, animal_project = "2012_leopoldkanaal")
> my_animals %>% distinct(projectcode)
projectcode
1 2012_leopoldkanaal
@JolienGoossens I do have the same issue here
When I request all the network project names (with my account), I get the following list:
[1] "cpod-lifewatch" "lifewatch" "Noordzeekanaal" "pc4c"
[5] "Deveron" "MorayFirth" "Aberdeen" "Conon"
[9] "Skye" "maas" "AZ0" "Jersey_Coastal"
[13] "MBA_Whitsand" "MBA_Massmo" "MBA_Wavehub" "dijle"
[17] "albert" "bovenschelde" "leopold" "bpns"
[21] "demer" "saeftinghe" "DTU-Skjern" "Limfjord"
[25] "SVNL-FISH-WATCH" "BECORV" "Inforbiomares" "ARAISOLA01"
[29] "ARAISOLA02" "ARAISOLA03" "SARTELARAM" "SARTELZINGARO"
[33] "SARTELTG" "thornton" "ws1" "ws2"
[37] "ws3" "zeeschelde" "PTN-Silver-eel-Mondego"
so, I would also assume that dijle
and AZ0
are valid inputs to select from. However, when using the projectcodes returned when requesting ALL animals, I do get another set of names:
> get_animals(con) %>% distinct(projectcode)
projectcode
1 phd_reubens
2 2015_phd_verhelst
3 homarus
4 2015_fint
5 rangetest
6 2013_albertkanaal
7 2011_rivierprik
8 2012_leopoldkanaal
9 2014_demer
10 2015_dijle
This last list is actually coming from the animals view directly, so the available project names listed there. As such, I am missing something there or there is some inconsistency on the listing of the projects as included in the animals view.
@bwydoogh and @jreubens How do I have to interpret the projectcode
column in the available vliz.animals_view
view? Should I join myself with other tables to properly extract the projects or can you adapt the view?
@stijnvanhoey indeed, the headers which also should be retrieved are
length3
length3_type
length3_unit
length4
length4_type
length4_unit
Weight
Weight_unit
Indeed, now only 2012_leopoldkanaal
is given in the output
"length" "length_type" "length_units"
"length2" "length2_type" "length2_units"
"length3" "length3_type" "length3_units"
"length4" "length4_type" "length4_units"
"weight" "weight_units"
are all in the output. Remark: length3_units. @PieterjanVerhelst, if you have other questions regard this, please open a new issue, so we focus this issue on the projectcode
column in the available vliz.animals_view
view, cfr. comment above
@bwydoogh and @jreubens How do I have to interpret the projectcode column in the available vliz.animals_view view? Should I join myself with other tables to properly extract the projects or can you adapt the view?
A project is uniquely identified by its projectcode
, a field which is returned by the animals_view
. So you can join with the projects_view
based on that field. Notice that the animals_view
has 2 project-related fields: the projectcode
and the name
. I believe the 2 are mixed here.
Keep in mind that there are two different types of projects: network projects and animal projects. Networks are normally not under moratorium (as it is interesting to see where others have put receivers. Animals are typically put under moratorium (thus the list will be much more restricted)....
So, when I check the code of the animals view, which is the table we use to get the data when calling get_animals
:
SELECT
....
projects.projectcode
FROM vliz.animals
JOIN vliz.animal_tag_release ON animal_tag_release.animal_fk = animals.id_pk
JOIN vliz.tags ON tags.id_pk = animal_tag_release.tag_fk
JOIN vliz.projects ON animals.project_fk = projects.id AND projects.type::text = 'animal'::text
we can only select on the animal projects by filtering on the projectcode available in the view. Hence, the network_project argument in the get_animals
function does not work.
Actually, I would propose to exclude the ability to filter on network project for the function get_animals
and only provide an animal_project based filter.
If really required, I think (@bwydoogh tell me if I'm wrong) we should have to check for all animals involved the following: which detections are done with these animals, with which receivers have these been detected during which deployment to be able to link it with the network_projects involved (as deployments are linked to network projects). As such, we could identify those animals involved with a specific network_project and filter on them.
Is this worthwhile and is this feasible query-wise?
@stijnvanhoey
Is this worthwhile and is this feasible query-wise?
Yes, you are correct (query structure). If it is worth it: I leave it up to others to decide about this. But I would like to notice that this is planned to foresee within the ETN web application (@jreubens has asked for this some time ago; wants to have an overview of what was seen on a deployment).
So, when implemented for the ETN, I assume this will be part of a view that the webapp will use as well, I assume?
@stijnvanhoey Not necessarily (we could keep the logic within the ETN web application) but we could foresee this if it is a demand within this context of the R-package.
Well, I can't optimize the query, but with the access rights that I have, it boils down to the following query in order to include the network_projects:
SELECT DISTINCT animals.*,
network_project.projectcode AS network_projectcode
FROM vliz.animals_view animals
JOIN vliz.animal_tag_release AS animal_tag_release ON (animals.id_pk = animal_tag_release.animal_fk)
JOIN vliz.tags AS tags ON (animal_tag_release.tag_fk = tags.id_pk)
JOIN vliz.detections_view AS detections ON ((tags.tag_code_space)::text = (detections.transmitter)::text)
JOIN vliz.deployments_view AS deployments ON (detections.deployment_fk = deployments.id_pk)
JOIN vliz.projects AS network_project ON (deployments.project_fk = network_project.id)
which takes a while ;-). So I propose to foresee the availability in the database as a view when developing the front-end to have a win-win...
maybe, a suboption is to pass one of the binned views, making it feasible to query...
e.g. with bins_week_view
SELECT DISTINCT animals.*,
network_project.projectcode AS network_projectcode
FROM vliz.animals_view animals
JOIN vliz.animal_tag_release AS animal_tag_release ON (animals.id_pk = animal_tag_release.animal_fk)
JOIN vliz.tags AS tags ON (animal_tag_release.tag_fk = tags.id_pk)
JOIN vliz.bins_week_view AS detections ON ((tags.tag_code_space)::text = (detections.transmitter)::text)
JOIN vliz.deployments_view AS deployments ON (detections.deployment_fk = deployments.id_pk)
JOIN vliz.projects AS network_project ON (deployments.project_fk = network_project.id)
@bwydoogh do we still need to do anything here? Apparently the issue is not yet fixed. @JolienGoossens mentioned she gets addition data when requesting here project.
@jreubens Yes, as nothing was planned for development. See https://jira.vliz.be/browse/ETN-231.
@PieterjanVerhelst , @jreubens : is this issue in some way also solved now that i solved #75?
Indeed, issue solved. Thanks for the nice work! I let you close the issue :-)
Thanks @PieterjanVerhelst!
Some bugs in the
get_animals()
functionality:When I run the code
my_animals <- get_animals(my_con, animal_project = "2012_leopoldkanaal")
I get the animals for both the2012_leopoldkanaal
andPhD van der Knaap
projects. This also holds true when other animal projects are selected; apparentlyPhD van der Knaap
comes as a standard in the output.The function does not return information about
length_type3
andlength_type4
nor the actual weight of the fish.