Closed aurielfournier closed 6 years ago
@embruna can you put your little flow chart of how the package works here?
Might help inform how we think about renaming the functions
These are messy- didn't want to invest too much time until you all were on board with how to organize them.
I tried to do a flow-chart for each of the three steps: data import, author disambiguation, and address parsing / georeferencing / visualization. At the end of each flow chart would be a list of the outputs from that section (e.g., csv files, lists, dataframes).
If I organize my thinking along the lines of these three big steps:
read_references
is probably fine, though I might have used import_references
had I been the original author.
something like clean_authors
or check_authors
might be a better description of what is being done by ``read_authors. Or maybe
tidy_authors``` (too trendy? lol)
3) I just realize I forgot to add refine_authors
to the flow chart after the box for "satisfied with the results?". And "satisfied with the results should actually be "are you ready to upload your corrections?"...
4) ....with these changes refine_authors
is actually a decent descriptor of what you are doing. Unless you think something else would be more clear: merge_authors
? edit_authors
?
I like the idea of making read_references
import_references
same for 2 - I think clean or tidy could work. I haven't throught through this, but i don't think these functions will be 'pipeable', so it might be good to avoid the word tidy, since I don't think they are going to play super well with the rest of the tidyverse
for refine authors I wonder if we could do disambiguate_authors
or something like that? seems more informative to me then merge or edit.
thoughts?
address_lat_long
could probably be better as well. Maybe georeference_authors
?
I think you're right about avoiding tidy, makes perfect sense. TO try and make the others simpler, maybe truncate the longer words to avoid spelling mistakes? disambig_authors
and georef_authors
? Not sure if that makes sense for "references" as well, e.g., import_refs
. Totally up to you.
current workflow
read_references()
read_authors()
refine_authors()
address_lat_long()
i don't have any issue with read_references()
, as that actually reads in a file.
I think that read_authors()
is confusing though, because its not reading anything into R, its only input is the object from read_references()
.
I propose replacing read_authors()
with clean_authors()
[trying to avoid the word tidy]
refine_authors()
is alright, I don't love it, but I think its fine.
while address_lat_long()
means something to me, I think georef_authors()
might be more informative.
Though I wonder if its really georef_authors()
or if it should be georef_records()
since we're really geo-referencing each record, not each individual author.
I just realized all the plotting functions are very broken, so I'm not going to suggest new names for them yet. Only working functions get new names. :/
read_references
: agree, it's fine.
read_authors
: clean_authors
is good - it gets the point across and is shorter/easier to write that other options I thought of like (e.g., create_author_list
, disambiguate_authors
or extract_authors
). "Clean" is also "Tidy"-adjacent. Thumbs up.
refine_authors
: agree it's fine. My only thought is that "refine" implies "refining the complete list of authors down to a smaller one actually used in the analysis based on some criteria", when what is actually being done is "merging corrections to the author list (if there are any)". Maybe something like correct_author_groups
, merge_manual_edits
, merge_edits
, or finalize_authors
? I like "finalize" because you still have to run this command to move on (yes?)...so it's a cue to users not to forget to do it even they don't have any corrections to make. But I don't have strong feelings about this over other possibilities.
georef_authors
: love it. I had thought of map_authors
, but now realize "map" is better forac tually creating a map. georef_authors
is both easy to write and precise.
I've been reading through the ROpenSci guidelines more.
Consider an objectverb() naming scheme for functions in your package that take a common data type or interact with a common API. object refers to the data/API and verb the primary action. This scheme helps avoid namespace conflicts with packages that may have similar verbs, and makes code readable and easy to auto-complete. For instance, in stringi, functions starting with stri manipulate strings (stri_join(), strisort(), and in googlesheets functions starting with gs are calls to the Google Sheets API (gs_auth(), gs_user(), gs_download()).
Given this guidance, I suggest
references_read() authors_clean() authors_refine() authors_georef()
Thoughts?
So that they are easier to tell apart, and probably shorter