Open luukvdmeer opened 2 years ago
I think this is a reasonable suggestion but I'm not sure the benefits would outweigh the costs. The reason for storing the file in the local directory is that it can save time: if you repeatedly need to read in a dataset during different sessions it takes much longer to convert to .gpkg each time than to read in from the .gpkg file. The conversion process is one of the slowest parts. My guess is that more person hours and compute resources will be saved by keeping .gpkg files in the data directory than by changing the default settings so that the are written to the temporary. Could there be an option that could allow the .gpkg file to be saved elsewhere for people who want to?
Hi Luuk, and thanks for creating this issue.
I would expect oe_read() to just read my osm.pbf file into R, and the translation to gpkg being only an internal procedure that as a user I don't even have to know of. Therefore I would argue that by default the gpkg file should be automatically removed after reading, or stored in a temporary folder rather than in the same folder as my osm.pbf file.
I agree that oe_read
may remove the gpkg file after reading it (and hide the whole procedure to the users), but I don't understand which are the benefits of removing that file (a more intuitive/clean process? less disk on space? it might be preferable to leave no artefacts after reading in the data?). I think it might be pretty easy to implement that, but, as Robin said, I'm still not sure if the benefits outweigh the costs. Can you add slightly more details (no rush)?
(Another slightly OT question: do you think that, at the moment, the functions in this package are too verbose? Because I love verbose output with a lot of details, just to be a little bit more secure that nothing happens when converting the data and so on, but I never understand what is a "reasonable" level of detail)
Could there be an option that could allow the .gpkg file to be saved elsewhere for people who want to?
IMO that adds unnecessary complexity since, in that case, we should also check this other directory when running oe_*
and decide what to do in case of conflicts between the contents in the standard download directory and this new directory. If you want to save the data you just read in a "non-standard" location, you can run st_save(xxx, "path.gpkg")
. Do you agree?
Can you add slightly more details (no rush)?
Currently I have an osm.pbf
file in the inst
directory of an R package (in development..). I read this file with oe_read()
in a vignette to show examples, but in that way it "clutters" my inst
directory. (I guess the best solution here is probably to add the .gpkg
file to both .gitignore
and .Rbuildignore
. You are right that it might even be good to keep the file to save resources when rebuilding vignettes)
More important for me is when I am not working locally. For example when reading pre-processed osm.pbf
files (e.g. clipped or pre-filtered) from company servers I would not like to also write to the company servers and store the .gpkg
file there permanently. Of course I can manually remove it afterwards. Therefore I also don't think this is a critical issue but it is just not intuitive to me that when I call a read function it automatically also writes something. I also don't think that would be regarded as "best practice" (but not sure about that).
Another slightly OT question: do you think that, at the moment, the functions in this package are too verbose?
My personal opinion would be that as long as I can set quiet = TRUE
and this effectively supresses all messages, it does not matter for me how verbose the output is when quiet = FALSE
(as you say more info might only be beneficial).
Thanks, it sounds like reasonable use cases. I will try to implement this later today! Maybe with a new argument in oe_get
and oe_read
named remove_gpkg
? I think that the default value should be FALSE for the same reasons that Robin explained.
See #237
When I use
oe_read()
to read anosm.pbf
file I have stored on disk, it internally callsoe_vectortranslate()
to translate theosm.pbf
file to agpkg
file, before reading it. However, it stores this.gpkg
file on disk in the same folder as myosm.pbf
file and does not remove it afterwards. I would expectoe_read()
to just read myosm.pbf
file into R, and the translation togpkg
being only an internal procedure that as a user I don't even have to know of. Therefore I would argue that by default thegpkg
file should be automatically removed after reading, or stored in a temporary folder rather than in the same folder as myosm.pbf
file.(this is of course different when I explicitly call
oe_vectortranslate()
, in that case I expect thegpkg
file to be stored, because that is the goal of the function).