ropensci / refsplitr

R package for processing, organizing, and visualizing reference records downloaded from the Web of Science.
https://docs.ropensci.org/refsplitr
Other
55 stars 6 forks source link

Need to create better function names #21

Closed aurielfournier closed 6 years ago

aurielfournier commented 6 years ago

So that they are easier to tell apart, and probably shorter

aurielfournier commented 6 years ago

@embruna can you put your little flow chart of how the package works here?

Might help inform how we think about renaming the functions

embruna commented 6 years ago

These are messy- didn't want to invest too much time until you all were on board with how to organize them.

sketch1.pdf sketch2.pdf

I tried to do a flow-chart for each of the three steps: data import, author disambiguation, and address parsing / georeferencing / visualization. At the end of each flow chart would be a list of the outputs from that section (e.g., csv files, lists, dataframes).

If I organize my thinking along the lines of these three big steps:

  1. read_references is probably fine, though I might have used import_references had I been the original author.

  2. something like clean_authors or check_authors might be a better description of what is being done by ``read_authors. Or maybetidy_authors``` (too trendy? lol)

3) I just realize I forgot to add refine_authors to the flow chart after the box for "satisfied with the results?". And "satisfied with the results should actually be "are you ready to upload your corrections?"...

4) ....with these changes refine_authors is actually a decent descriptor of what you are doing. Unless you think something else would be more clear: merge_authors? edit_authors?

aurielfournier commented 6 years ago

I like the idea of making read_references import_references

same for 2 - I think clean or tidy could work. I haven't throught through this, but i don't think these functions will be 'pipeable', so it might be good to avoid the word tidy, since I don't think they are going to play super well with the rest of the tidyverse

for refine authors I wonder if we could do disambiguate_authors or something like that? seems more informative to me then merge or edit.

thoughts?

address_lat_long could probably be better as well. Maybe georeference_authors?

embruna commented 6 years ago

I think you're right about avoiding tidy, makes perfect sense. TO try and make the others simpler, maybe truncate the longer words to avoid spelling mistakes? disambig_authors and georef_authors? Not sure if that makes sense for "references" as well, e.g., import_refs. Totally up to you.

aurielfournier commented 6 years ago

current workflow

read_references()
read_authors()
refine_authors()
address_lat_long()

i don't have any issue with read_references(), as that actually reads in a file.

I think that read_authors() is confusing though, because its not reading anything into R, its only input is the object from read_references().

I propose replacing read_authors() with clean_authors() [trying to avoid the word tidy]

refine_authors() is alright, I don't love it, but I think its fine.

while address_lat_long() means something to me, I think georef_authors() might be more informative.

Though I wonder if its really georef_authors() or if it should be georef_records() since we're really geo-referencing each record, not each individual author.

I just realized all the plotting functions are very broken, so I'm not going to suggest new names for them yet. Only working functions get new names. :/

embruna commented 6 years ago

read_references: agree, it's fine.

read_authors: clean_authors is good - it gets the point across and is shorter/easier to write that other options I thought of like (e.g., create_author_list, disambiguate_authors or extract_authors). "Clean" is also "Tidy"-adjacent. Thumbs up.

refine_authors: agree it's fine. My only thought is that "refine" implies "refining the complete list of authors down to a smaller one actually used in the analysis based on some criteria", when what is actually being done is "merging corrections to the author list (if there are any)". Maybe something like correct_author_groups , merge_manual_edits, merge_edits, or finalize_authors? I like "finalize" because you still have to run this command to move on (yes?)...so it's a cue to users not to forget to do it even they don't have any corrections to make. But I don't have strong feelings about this over other possibilities.

georef_authors: love it. I had thought of map_authors, but now realize "map" is better forac tually creating a map. georef_authors is both easy to write and precise.

aurielfournier commented 6 years ago

I've been reading through the ROpenSci guidelines more.

Consider an objectverb() naming scheme for functions in your package that take a common data type or interact with a common API. object refers to the data/API and verb the primary action. This scheme helps avoid namespace conflicts with packages that may have similar verbs, and makes code readable and easy to auto-complete. For instance, in stringi, functions starting with stri manipulate strings (stri_join(), strisort(), and in googlesheets functions starting with gs are calls to the Google Sheets API (gs_auth(), gs_user(), gs_download()).

Given this guidance, I suggest

references_read() authors_clean() authors_refine() authors_georef()

Thoughts?