wikipathways / pathway-figure-ocr

Extracting gene sets from published pathway figures
Apache License 2.0
15 stars 2 forks source link

Trouble reading pfocr_chemicals.tsv #37

Closed AlexanderPico closed 2 years ago

AlexanderPico commented 2 years ago

Using fill=T made the file readable:

chems <- read.table("../exports/pfocr_chemicals.tsv", sep = "\t", stringsAsFactors = F, header = T, fill=T)

...but the data is scuffed. See:

chems[which(chems$figure_id=="PMC5369987__oncotarget-08-16594-g005.jpg"),]

It looks like single quotes (or prime characters) are messing up the read, e.g., adenosine 5'-phosphorothioate. Setting quote="" seems to take care of that.

Now reading in 120,262 rows and 10 columns.