Closed e-kotov closed 2 months ago
Woohoo, thanks @e-kotov! That is an impressive schematic of function and LOT of commits. Promising. I will give this a proper review by Monday evening latest.
Quickfire comment seeing the workflow error message:
'::' or ':::' imports not declared from:
‘knitr’ ‘od’ ‘odjitter’ ‘osmactive’ ‘remotes’ ‘zonebuilder’
'library' or 'require' calls not declared from:
‘remotes’ ‘tidyverse’ ‘tmap’
If you add those to 'Suggests', GitHub actions should become happier.
If you add those to 'Suggests', GitHub actions should become happier.
not really) https://github.com/Robinlovelace/spanishoddata/actions/runs/10516439464/job/29138644071?pr=52
anyway, it passes locally
@Robinlovelace are we on for the scheduled Monday meeeting?
@Robinlovelace are we on for the scheduled Monday meeeting?
Yes I can make that.
Hi @e-kotov. Thanks for sending this. I wanted to give you my feedback today, but it will take longer. So, if it is ok, I'll review this, like Robin, at the latest on Monday eve. I'll focus on giving you my opinion on this good-looking virtual table explained in the vignette "Codebook and Cookbook for v1", and thinking of better functions' names.
Ok, I made some minor changes and bug fixes. Important note, there are now three new vignettes:
I also simplified the README and froze all code execution, at least for now. Also to save energy of GitHub actions. All plots and tables are currently hardcoded in README. Also hidden advanced details.
Picking up on this now, great to see tests passing!
Checked-out with:
gh pr checkout 52
@eugenividal Great and thorough review. I will address you comments soon. Quick question, have you actually installed the package from this branch and tried to use it as described?
@eugenividal
If you consider it relevant, I can translate these two documents (they are short), so that we can presented them just as our English translations.
On one hand, that would be great, as machine translation is of course not perfect, on the other, it does not seem like a priority to me at the moment. Also, consider that methodology docs for v2 data are MUCH longer, and we would want them to be translated too if you do that for v1. So... Consider this before you commit to translating)
I see a middle ground in translating the most important bits of methodologies and the ones that you think the automatic translation got very wrong and including them into our vignettes. I think the text in the vignettes is more important, because the vignettes will be pagers on the pkgdown website of the package and will be indexed better by search engines than any PDFs we attach.
@e-kotov there seems to be duplication between these two vignettes:
ls vignettes/*convert*
vignettes/converting-the-data-for-faster-analysis.qmd vignettes/download-and-convert-all-data.qmd
Also: not a fan of the filenames, how about download-convert
or just convert.qmd
: conversion implied download, right?
Suggestion: let's have a single codebook.qmd
doc.
@eugenividal Great and thorough review. I will address you comments soon. Quick question, have you actually installed the package from this branch and tried to use it as described?
@e-kotov, I am glad you found my comments useful. I just tried installing the package from this branch and using it. I found some issues:
1) I had issues with the instructions you gave in the first comment of this PR. See screenshot below:
2) I had some issues when running the code in the vignette. See the sequence of commands and messages below:
I hope that helps.
Suggestion: let's have a single
codebook.qmd
doc.
For v1 and v2 in one file? I would refrain from that. In the end, the two datasets are very different. I would separate them on purpose to make it clear they are NOT compatible (and the official methodology doc for v2 emphasises, that the two datasets should not be compared directly, e.g. one cannot directly compare trip counts from 2020-2021 (v1) and 2022+ (v2). Also, in v2 data there will be many more data sets that are not available in v1.
I just tried installing the package from this branch and using it. I found some issues
@eugenividal
Thanks for that. Before installing the package try installing quarto:
install.packages("quarto")
Overall, this is not a big problem, as one can always install without vignettes. When the package is on CRAN, this will not be an issue. This is only relevant for installs from GitHub.
Since you were not able to install the package because of the errors in the first screenshot, you are not using the latest version and therefore the code from the vignettes will not work.
@eugenividal you can install without vignettes for now:
remotes::install_github(
"Robinlovelace/spanishoddata@v1-full-support-based-on-district-level-data-only",
force = TRUE
)
Without vignettes the spod_codebook
will not work, but other functions should.
Thanks for that. Before installing the package try installing quarto:
@e-kotov After installing quarto and the spanishoddata package, I get the same error messages if I run each of the chunks separately. However, I can render v1-2020-2021-mitma-data-codebook.qmd
@eugenividal
I can render v1-2020-2021-mitma-data-codebook.qmd
Rendering works because no code is actually evaluated, the option that turns this off is in the header of the qmd file.
It seems like you still don't have the version from this particular branch.
If you run spod_get
(yes without the round bracets):
spod_get
You should be getting in console:
function (type = c("od", "origin-destination", "os",
"overnight_stays", "tpp", "trips_per_person"),
zones = c("districts", "dist", "distr",
"distritos", "municipalities", "muni",
"municip", "municipios"), dates = NULL, data_dir = spod_get_data_dir(),
quiet = FALSE, max_mem_gb = 3, max_n_cpu = parallelly::availableCores() -
1, max_download_size_gb = 1, duckdb_target = ":memory:",
temp_path = spod_get_temp_dir())
{
and the rest of the function code...
If your first lines of the output are different, you are not on the relevant package version.
In that case try:
remove.packages("spanishoddata")
remotes::install_github(
"Robinlovelace/spanishoddata@v1-full-support-based-on-district-level-data-only",
build_vignettes = FALSE,
force = TRUE
)
@e-kotov
On one hand, that would be great, as machine translation is of course not perfect, on the other, it does not seem like a priority to me at the moment. Also, consider that methodology docs for v2 data are MUCH longer, and we would want them to be translated too if you do that for v1. So... Consider this before you commit to translating)
Agree👌 Not a priority, we can consider this later on.
@e-kotov
You should be getting in console:
This is what I get if I run spod_get
In that case try:
Still the same error messages 😔
@eugenividal I don't know why, but for some reason you are getting the "old" version of the package from the main
branch, instead of getting the one at the v1-full-support-based-on-district-level-data-only
branch.
@e-kotov
@eugenividal I don't know why, but for some reason you are getting the "old" version of the package from the main branch, instead of getting the one at the v1-full-support-based-on-district-level-data-only branch.
Thanks for spotting the reason, I will try to find a way to fix it.
Thanks for spotting the reason, I will try to find a way to fix it.
Ideally,
remotes::install_github(
"Robinlovelace/spanishoddata@v1-full-support-based-on-district-level-data-only",
build_vignettes = FALSE,
force = TRUE
)
should be installing the v1-full-support-based-on-district-level-data-only
branch, hence the @v1-full-support-based-on-district-level-data-only
... But why it fails for you, I have no clue :(
Suggestion: let's have a single
codebook.qmd
doc.For v1 and v2 in one file? I would refrain from that. In the end, the two datasets are very different. I would separate them on purpose to make it clear they are NOT compatible (and the official methodology doc for v2 emphasises, that the two datasets should not be compared directly, e.g. one cannot directly compare trip counts from 2020-2021 (v1) and 2022+ (v2). Also, in v2 data there will be many more data sets that are not available in v1.
:+1:
Thanks for spotting the reason, I will try to find a way to fix it.
Ideally,
remotes::install_github( "Robinlovelace/spanishoddata@v1-full-support-based-on-district-level-data-only", build_vignettes = FALSE, force = TRUE )
should be installing the
v1-full-support-based-on-district-level-data-only
branch, hence the@v1-full-support-based-on-district-level-data-only
... But why it fails for you, I have no clue :(
You're missing Quarto, should be:
remotes::install_github(
"Robinlovelace/spanishoddata@v1-full-support-based-on-district-level-data-only",
build_vignettes = FALSE,
force = TRUE,
depedencies = TRUE
)
That will install the 'deps'.
@e-kotov good to go now assuming you're OK with #56. Non-controversial, and tested on V2 data for parquet and duckdb as you'll see. If you do want to change the pkg name let's do that as a separate PR, this one is already big!
@Robinlovelace reviewing and fixing a few typos as I go. Will be done soon. Regarding name change and separate PR - ok. Still thinking about it.
You're missing Quarto, should be:
Working!!!
@eugenividal just merged latest function name changes (not spod_convert and spod_connect, even if these are still controversial, at least they are short ;) ) So you may want to re-install the package once again using Robin's code with deps=true.
@Robinlovelace Ok, to me this PR seems mostly done. Thanks for improving and combining the vignettes and testing with v2 data. In a few hours I will implement @eugenividal comments on the wording of the codebook and merge this.
Great, let me know if you want me to postpone putting it out there but sounds like we're on-track for launch.
All on track to merge this by 15:00 UK time (16:00 in Germany I believe) @e-kotov ?
@Robinlovelace yes, cleaned up 99%. Ready. We don't have a logo)
Done. Let's see how the pkgdown website renders and looks live and perhaps will make a few tweaks.
Closes #3, #27, #33, #34, #35, #57. Also solves:
OK, now we are done. We have a working package,
a logo
and an opengraph card for social media
@Robinlovelace can you upload this https://github.com/Robinlovelace/spanishoddata/blob/main/man/figures/card.png?raw=true to Social Preview section of https://github.com/Robinlovelace/spanishoddata/settings ? I cannot access settings.
Sure!
@e-kotov The logo is super cool. I loved it! 🤩
Just a quick question: is that plot real? I meant does the arrow width represent the volume of trips? I'd have expected most trips in the triangle Madrid - Barcelona - Valencia
@eugenividal it is real. I would have spent the whole day to make it by hand) I will post the code a bit later, I want to find a good place to put it. Probably should not be in the package itself in the R folder, but in some other folder in the repo. These flows are aggregated and therefore what you see could be more of a visual artifact. There are not between cities, but between provinces.
@e-kotov thanks for your quick reply. That's interesting!
@eugenividal ah, also I did some flows aggregation as there's some bug in flowmapper that I did not have time to deal with
So the lines in the logo actually combine flows for just 10 nodes and some provinces are aggregated into single point... The image above is with 19 nodes (max allowed by the package)
@e-kotov There are 17 autonomous communities and two autonomous cities (Ceuta and Melilla) in Spain. Perfect for the max allowed in the package!
Great set of comments @eugenividal. PR welcome, look forward to seeing @e-kotov's comments, your suggestions seem good to me, thank you!
@eugenividal Thanks for a thorough review and feedback. Working on updated version in a new branch here: https://github.com/Robinlovelace/spanishoddata/blob/minor-fixes-ek/vignettes/convert.qmd . Work-in-progress, so no new review necessary. I will do a PR when I'm ready.
@e-kotov No problem. Great! I'll be working on other stuff until Tuesday. Yes, please, send a new PR when it is ready and let me know from Tuesday if you would like me to look at any other aspects.
ok, all done, full support for V1 data and much more cool stuff.
Important: to simplify a few things, the v1 data is now downloaded as csv.gz instead of txt.gz. So before testing, please nuke your cache folder or rename all txt.gz files to csv.gz files. txt.gz files should not get in the way as the functions look specifically for csv.gz files, but if you do not remove them, they will just take up storage space.
Please install with
If the above fails for you because of vignettes, do:
Here's an overview:
See the new vignette for testing all the functions.
@eugenividal I am very interested in your opinion on how well the concept of working with this virtual duckdb table that you get from
spod_get()
andspod_connect_to_converted_data()
is explained in the vignette https://github.com/Robinlovelace/spanishoddata/blob/v1-full-support-based-on-district-level-data-only/vignettes/v1-2020-2021-mitma-data-codebook.qmd . Also, suggestions for better function names, specifically the ones that are user-facing functions covered in the vignette.