Closed zdz2101 closed 7 months ago
Hi @rossfarrugia @zdz2101, @CDC-DNPAO,
I have a few questions 😀.
Regarding the metadata: I noticed that the structure of WHO metadata is different from CDC. In WHO, there is one dataset per sex. I would then keep this format and recreate the internal metadata as they are from the WHO website. Is it ok to proceed this way? I will propose a naming convention similar to Zelos.
Regarding the technical aspect: I noticed Zelos created the programs that build the dataset in inst folder. It means that the end user will have access to it when installing the package. Is it something that we want or should we store these programs in a data-raw folder which is not accessible to the end user but only to the developer?
Thanks for your help!
Piere - you're right about the difference between the WHO and CDC reference files. I've always hated the separate WHO files for each sex because it involved more downloading, but whatever you think is the best way to store these files is absolutely fine with me.
It's hard for me to think that most people will actually want to see these reference data files, but there might be some very unusual person who wants to. So if it's easy to make them available, it would be a good idea. Honestly, they are available in my packages, but this is mostly because I didn't know what I was doing when building a package. I feel as if I've learned R by reading StackOverflow answers!
David
On Wed, Mar 6, 2024 at 6:06 AM Pierre-Wallet @.***> wrote:
Hi @rossfarrugia https://github.com/rossfarrugia @zdz2101 https://github.com/zdz2101, @CDC-DNPAO https://github.com/CDC-DNPAO,
I have a few questions 😀.
Regarding the metadata: I noticed that the structure of WHO metadata is different from CDC. In WHO, there is one dataset per sex. I would then keep this format and recreate the internal metadata as they are from the WHO website. Is it ok to proceed this way? I will propose a naming convention similar to Zelos.
Regarding the technical aspect: I noticed Zelos created the programs that build the dataset in inst folder. It means that the end user will have access to it when installing the package. Is it something that we want or should we store these programs in a data-raw folder which is not accessible to the end user but only to the developer?
Thanks for your help!
— Reply to this email directly, view it on GitHub https://github.com/pharmaverse/admiralpeds/issues/17#issuecomment-1980621268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2XGPVOZJDG3SFYEQLHWETYW32ETAVCNFSM6AAAAABEAE7FWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGYZDCMRWHA . You are receiving this because you were mentioned.Message ID: @.***>
Piere - I've thought about this a little more, and there actually may be some (5%?) users who would like to see the reference data, so it would be a good idea to make the data available (unless it's a lot more work for you)
There's an example of this is the Sitar package: just load it and type the command 'who06' or 'cdc2000'.
I do have a question: were you planning to leave all the files separate, or were you going to combine them in long or wide format? 'who06' shows the reference data in wide format for each week (0.0192 years) of age. LMS are the reference parameters and the values are given for height, weight, BMI, head circumference, etc. 'cdc2000 gives the data at age 24.0, 24.5, 25.5, etc months of age.
Another thing to note is that for CDC reference data, an age value of 24.5 represents all data for kids between 24.0 and 24.999 months of age. cdcanthro linearly interpolates the LMS values for people who actually know the child's age in days. This interpolation doesn't make a big difference in the estimated z-scores and percentiles
david
david
david
david
On Wed, Mar 6, 2024 at 12:26 PM David Freedman @.***> wrote:
Piere - you're right about the difference between the WHO and CDC reference files. I've always hated the separate WHO files for each sex because it involved more downloading, but whatever you think is the best way to store these files is absolutely fine with me.
It's hard for me to think that most people will actually want to see these reference data files, but there might be some very unusual person who wants to. So if it's easy to make them available, it would be a good idea. Honestly, they are available in my packages, but this is mostly because I didn't know what I was doing when building a package. I feel as if I've learned R by reading StackOverflow answers!
David
On Wed, Mar 6, 2024 at 6:06 AM Pierre-Wallet @.***> wrote:
Hi @rossfarrugia https://github.com/rossfarrugia @zdz2101 https://github.com/zdz2101, @CDC-DNPAO https://github.com/CDC-DNPAO,
I have a few questions 😀.
Regarding the metadata: I noticed that the structure of WHO metadata is different from CDC. In WHO, there is one dataset per sex. I would then keep this format and recreate the internal metadata as they are from the WHO website. Is it ok to proceed this way? I will propose a naming convention similar to Zelos.
Regarding the technical aspect: I noticed Zelos created the programs that build the dataset in inst folder. It means that the end user will have access to it when installing the package. Is it something that we want or should we store these programs in a data-raw folder which is not accessible to the end user but only to the developer?
Thanks for your help!
— Reply to this email directly, view it on GitHub https://github.com/pharmaverse/admiralpeds/issues/17#issuecomment-1980621268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2XGPVOZJDG3SFYEQLHWETYW32ETAVCNFSM6AAAAABEAE7FWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGYZDCMRWHA . You are receiving this because you were mentioned.Message ID: @.***>
Hi David,
my point was to discuss the access to the program that builds the reference data, not the data itself. Of course the reference data will be available.
I am in favor of keeping all files separated to reflect exactly what is contained in the .xls files on the WHO site. It is easier for any end-user to compare the initial source to what we store in the package. Plus these reference dataset are the unique source of truth to start all the upcoming derivations. Combining them increases the risk of making a handling mistake and the overall maintenance. This is only my opinion and based on my current knowledge of this project.
Pierre
OK - whatever you think is best is fine with me
david
On Wed, Mar 6, 2024 at 5:36 PM Pierre-Wallet @.***> wrote:
Hi David,
my point was to discuss the access to the program that builds the reference data, not the data itself. Of course the reference data will be available.
I am in favor of keeping all files separated to reflect exactly what is contained in the .xls files on the WHO site. It is easier for any end-user to compare the initial source to what we store in the package. Plus these reference dataset are the unique source of truth to start all the upcoming derivations. Combining them increases the risk of making a handling mistake and the overall maintenance. This is only my opinion and based on my current knowledge of this project.
— Reply to this email directly, view it on GitHub https://github.com/pharmaverse/admiralpeds/issues/17#issuecomment-1981974163, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY2XGPUAQ5F5GV3KZWBME7LYW6K7ZAVCNFSM6AAAAABEAE7FWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBRHE3TIMJWGM . You are receiving this because you were mentioned.Message ID: @.***>
I am in favor of keeping all files separated to reflect exactly what is contained in the .xls files on the WHO site.
Makes sense to me Pierre! In the rare chance they ever change then it would be easier to maintain as you say.
Just adding a helpful package for completing this issue: datapasta
Thanks @ngiangre for this useful package! I knew it but it was the 1st time I used it, so thanks for this opportunity.
One thing that did not work well is the comma handling. Indeed the decimal separator is the comma in the source files I read. And {datapasta} did not handle it well. I had to do a "find and replace" to replace the commas with points within the whole file and then it worked well. Do you know by any chance if there is a more direct operation? Thanks, Pierre.
Glad that pkg was somewhat helpful! I have not used it myself - I just saw it circulating on fosstodon :)
If you find a better approach let us know! What might be hard is knowing when those files update. Maybe use the {fs} or {httr2} pkgs to see if the files updated (e.g. time stamp)? Might be helpful so that we can subsequently update the data in our package. If anything, maybe include a test for this - we would then have to update our pkg accordingly if those files were ever updated. Does this make sense?
Cheers, Nick Giangreco, PhD nickg.bio
On Thu, Mar 7, 2024 at 7:25 AM Pierre-Wallet @.***> wrote:
Just adding a helpful package for completing this issue: datapasta https://milesmcbain.github.io/datapasta/
Thanks @ngiangre https://github.com/ngiangre for this useful package! I knew it but it was the 1st time I used it, so thanks for this opportunity.
One thing that did not work well is the comma handling. Indeed the decimal separator is the comma in the source files I read. And {datapasta} did not handle it well. I had to do a "find and replace" to replace the commas with points in all the file and then it worked well. Do you know by any chance if there is a more direct operation? Thanks, Pierre.
— Reply to this email directly, view it on GitHub https://github.com/pharmaverse/admiralpeds/issues/17#issuecomment-1983402400, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3TDSY23CQ5BOHMBBFBK5LYXBMDFAVCNFSM6AAAAABEAE7FWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBTGQYDENBQGA . You are receiving this because you were mentioned.Message ID: @.***>
@zdz2101 , @rossfarrugia
having an issue with my PR that I do not really understand. The roxygens comments seem to be aligned with .Rd files. Any help is parreciated. Thanks in advance.
Below is the issue from the pipeline:
D man/WHO_bmi_for_age_boys.Rd D man/WHO_bmi_for_age_girls.Rd ?? man/who_bmi_for_age_boys.Rd ?? man/who_bmi_for_age_girls.Rd 🙈 Manuals are not up-to-date with roxygen comments! 🔀 The following differences were noted:
ℹ roxygen2 version that was used in this workflow: ‘7.3.1’ 🙏 Please ensure that the 'RoxygenNote' field in the DESCRIPTION file matches this version Error: Process completed with exit code 1.
@Pierre-Wallet just run devtools::document()
, commit and push the renaming of the .Rd files, the two you have pushed rn is capital WHO and the roxygenation turns it into lowercase
@zdz2101 , already done before , it does not change anything.
I updated the name of all the metadatasets and I do not understand why there is a specific issue with these 2 only.
weird, I pushed a commit for you, looks like its going through now, looks ready to review soon, just tag me once you think you're done
I do not really get what is happening because I did not do anything. And now everything is in green! THanks for your help.
possible that there is a latency between the push and the pipeline run?
I do not really get what is happening because I did not do anything. And now everything is in green! THanks for your help.
possible that there is a latency between the push and the pipeline run?
Pipelines generally will cancel out (if they were running) and immediately get reran after every push. The two rd files on github were still capitalized, so you might've had two uncommitted changes, I just pushed the renaming for you and we're all good now
As discussed today @Pierre-Wallet will also add the head circumference metadata as part of this PR.
Background Information
From discussion #13:
Recreate the data using
tribble()
for replicability as data source may change overtime and its easier to track than floating csv's Use snapshots for testthatDefinition of Done
See how #15 was done
tibble::tribble()
testthat::expect_snapshot()
so we can track changes in the feature