ropensci-archive / microdemic

:warning: ARCHIVED :warning: microsoft academic client
Other
16 stars 2 forks source link

invabs2abs is inserting "NA" into abstracts #8

Closed crew102 closed 5 years ago

crew102 commented 5 years ago

The inverted index that the API serves sometimes has missing values. For example, the index may provide information about the tokens at locations 0, 2, and 3, but not location 1 (e.g., "InvertedIndex":{"i":[0],"big":[2], "dogs":[3]}.)

invabs2abs(), however, assumes that there is a token for each location. This results in NAs getting inserted into the abstract text.

library(microdemic)
library(jsonlite)

res <- ma_abstract("And(Composite(AA.AuN=='jaime teevan'),Y>2012)", key = Sys.getenv("MAG_KEY"))
res$abstract[grepl("\\bNA\\b", res$abstract)]
#> [1] "The queries people issue to a search engine and the results clicked following a query change over time. For example, after the earthquake in Japan in March 2011, the query NA japan NA spiked in popularity and people issuing the query were more likely to click government-related results than they would prior to the earthquake. We explore the modeling and prediction of such temporal patterns in Web search behavior. We develop a temporal modeling framework adapted from physics and signal processing and harness it to predict temporal patterns in search behavior using smoothing, trends, periodicities, and surprises. Using current and past behavioral data, we develop a learning procedure that can be used to construct models of users' Web search activities. We also develop a novel methodology that learns to select the best prediction model from a family of predictive models for a given query or a class of queries. Experimental results indicate that the predictive models significantly outperform baseline models that weight historical evidence the same for all queries. We present two applications where new methods introduced for the temporal modeling of user behavior significantly improve upon the state of the art. Finally, we discuss opportunities for using models of temporal dynamics to enhance other areas of Web search and information retrieval."
#> [2] "The physical constraints of smartwatches limit the range and complexity of tasks that can be completed. Despite interface improvements on smartwatches, the promise of enabling productive work remains largely unrealized. This paper presents NA WearWrite , a system that enables users to write documents from their smartwatches by leveraging a crowd to help translate their ideas into text. WearWrite users dictate tasks, respond to questions, and receive notifications of major edits on their watch. Using a dynamic task queue, the crowd receives tasks issued by the watch user and generic tasks from the system. In a week-long study with seven smartwatch users supported by approximately 29 crowd workers each, we validate that it is possible to manage the crowd writing process from a watch. Watch users captured new ideas as they came to mind and managed a crowd during spare moments while going about their daily routine. WearWrite represents a new approach to getting work done from wearables using the crowd."

Created on 2019-01-07 by the reprex package (v0.2.0.9000).

Session info ``` r devtools::session_info() #> Session info ------------------------------------------------------------- #> setting value #> version R version 3.4.4 (2018-03-15) #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> tz America/New_York #> date 2019-01-07 #> Packages ----------------------------------------------------------------- #> package * version date source #> backports 1.1.2 2017-12-13 cran (@1.1.2) #> base * 3.4.4 2018-03-16 local #> compiler 3.4.4 2018-03-16 local #> crul 0.7.0 2019-01-04 CRAN (R 3.4.4) #> curl 3.2 2018-03-28 CRAN (R 3.4.4) #> datasets * 3.4.4 2018-03-16 local #> devtools 1.13.6 2018-06-27 cran (@1.13.6) #> digest 0.6.17 2018-09-12 cran (@0.6.17) #> evaluate 0.10.1 2017-06-24 cran (@0.10.1) #> graphics * 3.4.4 2018-03-16 local #> grDevices * 3.4.4 2018-03-16 local #> htmltools 0.3.6 2017-04-28 cran (@0.3.6) #> httpcode 0.2.0 2016-11-14 CRAN (R 3.4.4) #> jsonlite * 1.5 2017-06-01 cran (@1.5) #> knitr 1.20 2018-02-20 cran (@1.20) #> magrittr 1.5 2014-11-22 cran (@1.5) #> memoise 1.1.0 2017-04-21 CRAN (R 3.4.4) #> methods * 3.4.4 2018-03-16 local #> microdemic * 0.4.0 2018-10-25 CRAN (R 3.4.4) #> pillar 1.2.3 2018-05-25 cran (@1.2.3) #> R6 2.2.2 2017-06-17 cran (@2.2.2) #> Rcpp 0.12.18 2018-07-23 cran (@0.12.18) #> rlang 0.2.2 2018-08-16 cran (@0.2.2) #> rmarkdown 1.10 2018-06-11 cran (@1.10) #> rprojroot 1.3-2 2018-01-03 cran (@1.3-2) #> stats * 3.4.4 2018-03-16 local #> stringi 1.2.4 2018-07-20 cran (@1.2.4) #> stringr 1.3.1 2018-05-10 cran (@1.3.1) #> tibble 1.4.2 2018-01-22 cran (@1.4.2) #> tools 3.4.4 2018-03-16 local #> triebeard 0.3.0 2016-08-04 CRAN (R 3.4.4) #> urltools 1.7.1 2018-08-03 CRAN (R 3.4.4) #> utils * 3.4.4 2018-03-16 local #> withr 2.1.2 2018-03-15 CRAN (R 3.4.4) #> yaml 2.2.0 2018-07-25 cran (@2.2.0) ```

Happy to issue a PR for this.

sckott commented 5 years ago

thanks for another report @crew102

if you could submit a fix PR that'd be great

sckott commented 5 years ago

do submit separate PR's for the two issues if you could

crew102 commented 5 years ago

Yep no problem, I'll submit separate PRs. I should probably have them both submitted sometime later this week, after I've had a chance to ramp up on vcr.

sckott commented 5 years ago

9 fixed