Open wilkox opened 1 year ago
David! Thanks a lot for your help! Unfortunately, this doesn't work for files from PubMed (.nbib). Could you help as well? vk.txt
@sy-olesya Adding useBytes = TRUE
to writeLines()
seems to fix this particular problem. However, there is then another, apparently unrelated error (I had to truncate the input file as it couldn't fit the whole thing in memory):
library(revtools)
system2("file", c("~/tmp/vk.txt", "-I"), stdout = TRUE)
#> [1] "/Users/wilkox/tmp/vk.txt: text/plain; charset=utf-8"
utf8tolatin1 <- function(infile, outfile) {
content <- readLines(infile, encoding = "UTF-8")
latin1 <- iconv(content, from = "UTF-8", to = "latin1")
writeLines(latin1, outfile, useBytes = TRUE)
}
utf8tolatin1("~/tmp/vk.txt", "~/tmp/vk-latin1.txt")
system2("file", c("~/tmp/vk-latin1.txt", "-I"), stdout = TRUE)
#> [1] "/Users/wilkox/tmp/vk-latin1.txt: text/plain; charset=iso-8859-1"
bib <- read_bibliography("~/tmp/vk-latin1.txt")
#> Error in names(x_final) <- unlist(lapply(x_final, function(a) {: 'names' attribute [254] must be the same length as the vector [43]
Created on 2024-01-31 with reprex v2.1.0
I had a poke around and I think it's not parsing the nbib file correctly. You might want to open a separate issue about this if you are still having trouble.
I was getting the error : Error in gsub("[[:space:]]+", " ", x) : input string 12 is invalid
setting up the encoding option in readLines as "latin1" fixed the issue for me, I did not receive the error again.
Running
read_bibliography()
on a UTF-8 encoded file produces an error (see example file Cochrane.txt):Created on 2023-10-10 with reprex v2.0.2
This seems to arise from this line, and I think it's because the encoding for
z
is set to 'latin1', but since R 4.3.0 'Regular expression functions now check more thoroughly whether their inputs are valid strings (in their encoding, e.g. in UTF-8)'.A workaround is to convert the file into latin1 encoding first:
Created on 2023-10-10 with reprex v2.0.2