ropensci / citecorp

Client for the Open Citations Corpus
https://docs.ropensci.org/citecorp
Other
11 stars 4 forks source link

Fix error when not all identifiers exist #4

Closed Selbosh closed 5 years ago

Selbosh commented 5 years ago

The previous code assumed every article has all four of the pmcid, doi, pmid and paper and throws an error if any is missing. My fix relaxes this assumption and retrieves these columns from the data frame if they exist.

See https://github.com/ropenscilabs/citecorp/issues/1#issuecomment-545814584

Selbosh commented 5 years ago

The failing check is because the tests assume the column names are in a particular order.

Two solutions:

  1. change the test so that order doesn't matter
  2. use something like the following, which will optionally select columns and put them in the order specified, rather than the original order they appeared in the data frame
df <- df[, na.omit(match(c("doi", "pmid", "pmcid", "paper"), names(df)))]

I am not sure if the order of the columns is important for future plans, so I leave this decision to you.

sckott commented 5 years ago

thanks.

for the column order bit, i do want them in the same order for consistency for users to combine data downstream, etc. - with caveat that sometimes some columns won't exist, which is fine