Fix error when not all identifiers exist

Selbosh commented 5 years ago

The previous code assumed every article has all four of the pmcid, doi, pmid and paper and throws an error if any is missing. My fix relaxes this assumption and retrieves these columns from the data frame if they exist.

See https://github.com/ropenscilabs/citecorp/issues/1#issuecomment-545814584

Selbosh commented 5 years ago

The failing check is because the tests assume the column names are in a particular order.

Two solutions:

change the test so that order doesn't matter
use something like the following, which will optionally select columns and put them in the order specified, rather than the original order they appeared in the data frame

df <- df[, na.omit(match(c("doi", "pmid", "pmcid", "paper"), names(df)))]

I am not sure if the order of the columns is important for future plans, so I leave this decision to you.

sckott commented 5 years ago

thanks.

for the column order bit, i do want them in the same order for consistency for users to combine data downstream, etc. - with caveat that sometimes some columns won't exist, which is fine

ropensci / citecorp

Fix error when not all identifiers exist #4