inbo / etn

R package to access data from the European Tracking Network
https://inbo.github.io/etn/
MIT License
6 stars 5 forks source link

v2.3 beta release: access ETN data from your local computer! #318

Open PietrH opened 1 month ago

PietrH commented 1 month ago

Hello!

I'm happy to invite you to try out the newest experimental release of the etn R package. In summary, with this release you will be able to access the ETN data from your own computer without having to use the LifeWatch RStudio server. It would be great if you could try out some of your existing scripts, and check if the outputs are still in line with your expectations. Your feedback will be vital in making the transition for the larger user community as smooth as possible.

Two other major changes are:

This is still an experimental release, which means that you need to use a slightly different command to install it:

install.packages("remotes")
remotes::install_github("inbo/etn@v2.3-beta")

After this, you can check the version you have installed with packageVersion("etn") or by using the packages tab in RStudio:

Image

In this screenshot, I have etn version 2.3.0 installed and loaded.

You can find an overview of the changes in the NEWS, or (very detailed!) in the changelog.

If you encounter any problems, or have any questions or comments big or small. You can contact me directly via email, or if you prefer, create an issue directly on Github. Don't hesitate to make any remarks, even if you feel it'll only effect your specific workflow, nothing is too small to mention and I'm here to help.

You can find this information, and an up to date list of questions and answers on this webpage.

Thank you for helping me out!

Pieter


Questions and answers

I will update this post as new questions come in. If your question is not here, feel welcome to contact me directly or to leave a comment!

How can I install the beta release?

You can install the new release of the etn package alongside, or in place of your current installation. If you want to start over from scratch, you can deinstall etn first by running remove.packages(etn) in the terminal although normally this should not be necessary. By default R will only allow you to install one version of a package at a time, and I think this is for the best.

To install the beta release directly from github, you can run the following code;

install.packages("remotes")
remotes::install_github("inbo/etn@v2.3")

After this, you can check the version you have installed with packageVersion("etn") or by using the packages tab in RStudio:

Image

In this screenshot, I have etn version 2.3.0 installed and loaded.

If you ever want to go back to the latest official release, you can run: remotes::install_github("inbo/etn"). Let me know if you need help installing the beta or another specific version.

How can I tell what version of etn I'm using?

You can check what packages are currently loaded in R by running sessionInfo(), or you can also see it in RStudio in the packages tab (next to files, plots, help and viewer) where you can search for etn. The packages with ticks ☑ are loaded. You might need to click the refresh button if you made any changes since you opened the tab.

Image

In this screenshot, I have etn version 2.3.0.9000 installed, but not loaded.

What has changed between 2.2.1 and this experimental release of 2.3?

You can find a summary of the changes here. But the jist of it is that you can now access the data from your local system, and that you don't need to use connection objects or connect_to_etn() anymore.

Should I be getting identical results using the API (by default) or when using a local connection (api = FALSE) ?

Ideally yes, but currently there are some known differences. This is due to changes in the r package since the API version of the code started development. This will be fixed before the API is made public to the broad user community.

Even though I'm aware of this particular discrepancy, I still appreciate any reports of unexpected results like this.

Will this beta release be faster than 2.2.1 on the RStudio Server?

No, using a local connection will be faster than using an internet connection. Let me know if something doesn't work or takes too long using v2.3, and I'll do my very best to speed things along.

How to provide feedback or report a bug or a problem

This is an experimental release, and I'm very grateful for your help in testing the changes that we've made since v2.2.1. It would be helpful if you could try to run some of your existing scripts to see if the results are still in line with what you'd expect.

You can get my attention by:

If you encounter any problems, or have any questions or comments please don't hesitate to make any remarks, even if you feel it'll only effect your specific workflow, nothing is too small to mention.

lottepohl commented 3 weeks ago

Hi Pieter!

Finally gave the v2.3 beta a try, and for me and almost everything works as expected. This is a huge improvement for the data access, congrats!

Two things from my side:

  1. I tried querying >5mio rows of detection data, and that didn't work out, this is the error message: Error: error writing to connection In call: save(file = ".RData", envir = sessionenv, list = ls(sessionenv, all.names = TRUE), compress = FALSE) Querying those >5mio rows of detection data works on the RStudio server, it takes 1min10s to run. This is my line of code: detections_raja <- etn::get_acoustic_detections(con, scientific_name = "Raja clavata")
  2. When not inputting the userid correctly (happened to me that I was running several lines of code quickly after each other, and the next line was taken as input), is there a way to make the prompt for userid and pwd reappear? Or is it maybe also possible to store userid and pwd somewhere? (.Renviron does not work, which was expected).

Anyway, I'm super impressed with how well the 'local' R package version runs. Can we already communicate this development to some colleagues, or should we wait until the offical release?

Best, Lotte

PieterjanVerhelst commented 3 weeks ago

I would for now keep this version under the few people it has been shared with.

PietrH commented 3 weeks ago

Hi Pieter!

Finally gave the v2.3 beta a try, and for me and almost everything works as expected. This is a huge improvement for the data access, congrats!

Two things from my side:

1. I tried querying >5mio rows of detection data, and that didn't work out, this is the error message:
   `Error: error writing to connection In call: save(file = ".RData", envir = sessionenv, list = ls(sessionenv,  all.names = TRUE), compress = FALSE)`
   Querying those >5mio rows of detection data works on the RStudio server, it takes 1min10s to run. This is my line of code:
   `detections_raja <- etn::get_acoustic_detections(con, scientific_name = "Raja clavata")`

2. When not inputting the userid correctly (happened to me that I was running several lines of code quickly after each other, and the next line was taken as input), is there a way to make the prompt for userid and pwd reappear? Or is it maybe also possible to store userid and pwd somewhere? (.Renviron does not work, which was expected).

Anyway, I'm super impressed with how well the 'local' R package version runs. Can we already communicate this development to some colleagues, or should we wait until the offical release?

Best, Lotte

Hi Lotte, thank you so much for having a look!

1. Big detection queries

We have something in the works to try to get bigger queries to work more smoothly. I'm waiting for some changes on the VLIZ IT side and I also still have some developments to do. This is all tracked in: #323, #325, #327, #329 and in pull request #328. The core of the problem is that the underlying database view needs optimisation, which I've been told is on the roadmap, but will not arrive for more than 6 months. So we'll try to improve on the package side as much as possible in the meantime.

I'm confident we can get larger queries to work for sure, it might take a while for you to get your results (longer than with a local database connection on the RStudio Server) but the aspiration is to at least make them possible!

2. credentials

You are not the only one who has run into trouble here. This definitely needs changing!

How would you like it to work? Currently the package doesn't actually check if the username or password are correct when you enter them, so I'd like to do that and let the user re-enter them when they are wrong. Would it also be useful to change them afterwards? Currently you'll need to enter credentials every time you restart your R session, so that's one option that is currently available, and you can manually edit them with Sys.setenv(userid = "your username", pwd = askpass::askpass()), which is not ideal.

Do you prefer using .Renviron like {rgbif}? I was thinking on the longer term to switch to {keyring} and let Windows/MacOS/Linux deal with storing the passwords, just like the {movebank} package. This is very safe, but a bit more complex which is why I went with the current simple system we have (which is much more related to how ETN has worked all along).

I'm also keeping in mind that someone might want to setup some automations in the cloud with etn (dashboards, automatic publications via dwc mapping etc.), and having the passwords passed either through the OS or via system parameters is very convenient in that case. But we'll cross that bridge when we get there.

3. Opening up the beta

The current beta is not completely the same as the production release of the package, I'd like to fix that before we do a wider release. I also believe entering and changing credentials needs to be streamlined. You can see what I'm planning for the next patch in this milestone: https://github.com/inbo/etn/milestone/8, but give me a shout if you feel I need to move something up in the planning.

Getting some sort of testing environment where I can make changes to the API without breaking everything for people using it will speed things up greatly. If I make a change now, it changes for everyone, regardless of what version of the beta they are on, so I need to be really careful when I'm working on the API side of things.