jlacko / RCzechia

A package providing Czech shapefiles - LAU & NUTS regions, municipalities, rivers etc. - in R friendly format for analysis & visualization
https://rczechia.jla-data.net
Other
24 stars 6 forks source link

Data origin: JOSS Review #56

Closed nickbearman closed 1 year ago

nickbearman commented 1 year ago

Just adding things as I go through!

You have a lot of data in here, do you have a comprehensive list of the data, it's source and related information? I think this would we really useful. There is a nice summary in the readme, but you only mention the source for the admin areas.

A table, with name, source, last updated and a link to the source would be really nice. This would a) help people understand where the data come from and b) highlight the usefulness of the package more.

Thanks :-)

https://github.com/openjournals/joss-reviews/issues/5082

jlacko commented 1 year ago

Thanks for the comment, it is definitely a valid one - serving data is the chief purpose of the package.

The approach I took when writing it is that each function serves a {sf} flavored data frame, and the data source + cutoff date (more important for man made objects such as admin areas than say rivers or terrain) is in function documentation / the man pages.

Do you think it this would be a better practice to summarise it somewhere?

jlacko commented 1 year ago

I am using the @source field of the function documentation - e.g. https://github.com/jlacko/RCzechia/blob/master/man/volebni_okrsky.Rd#L7

nickbearman commented 1 year ago

Ah - I see. It makes sense for it to be in the help pages, that is really useful. I think it would be worth mentioning it in the readme and/or in one of the examples - if it isn't already.

I think a summary would be useful too. You could do this pragmatically I think. A list of name, title, source would be ideal. You could also include size, but I see this is more work as it's not stored programmatically there I think. Happy for other reviewers to comment too.

jlacko commented 1 year ago

I will give it a think. There should be a way to do it programmatically, as I would rather prefer not to have two manual updates that can get out of sync. Combining it with size makes perfect sense, as some - and the terrain you mentioned in #58 is the worst culprit - datasets are rather large. I have addressed it in the function docs, with a somewhat light hearted manner (size is ... , so proceed with caution and patience). But it may not be immediately obvious, since you had to ask. Thank you for the point!

jlacko commented 1 year ago

I have updated an existing summary of datasets in the package documentation to include source with link (where applicable) and download size. It is in intro section of the package manual, which is listed on CRAN in pdf format and distributed with the package as a set of markdown documents.

Unfortunately this document is not quite legible on GitHub directly, as the {tabular} field of a markdown file does not render in an user friendly manner in https://github.com/jlacko/RCzechia/blob/master/man/intro.Rd

On the other hand it felt more natural to expand an existing part of the package documentation rather than to create a entirely new file that would make sense only in GitHub context and would have to be ignored from build for CRAN distribution purposes due to standard checks.

nickbearman commented 1 year ago

Thanks @jlacko - this makes sense, and looks good