jkeirstead / scholar

Analyse citation data from Google Scholar
Other
312 stars 83 forks source link

Suggested h-index change per year plots #73

Open TS404 opened 5 years ago

TS404 commented 5 years ago

Hi,

I love the package. I made a couple of additional functions to how the change in h-index over time. The code could certainly be tidied a lot, but it's functional.

get_yearly_publications <- function(id){
    pub.list <- NULL
    for (i in scholar::get_publications(id)$pubid){
        #print(i)
        pub.list <- rbind(pub.list,scholar::get_article_cite_history(id,i))
    }
    years <- min(pub.list[,1]):max(pub.list[,1])
    papers <- unique(pub.list[,3])
    pub.table  <- array(dim=c(length(years),
                              length(papers)),
                        dimnames=list(years,
                                      papers))
    for (i in 1:nrow(pub.list)){pub.table[as.character(pub.list[i,1]),pub.list[i,3]] <- pub.list[i,2]}    
    pub.table  
}

h_by_year <- function(pub.table){
    hyear <- NULL
    for(i in 2:nrow(pub.table)){
        h <- NULL
        for (j in 1:ncol(pub.table)){h <- append(h,(sort(colSums(pub.table[1:i,],na.rm = 1),decreasing = TRUE)[j]>=j))}
        hyear[i] <- sum(h)
    }
    hyear[is.na(hyear)] <-0
    names(hyear) <- rownames(pub.table)
    hyear
}

plot_hyear_full <- function(pub.table){
    plot(colSums(pub.table,na.rm = 1),
         type="l",
         xlab="Paper rank",
         ylab = "Citations per paper")
    abline(0,1,col="grey")
    for (i in nrow(pub.table):2){
        lines(sort(colSums(pub.table[1:i,],na.rm = 1),decreasing = TRUE),
              col=colorRampPalette(c("lightblue", "darkblue"))(nrow(pub.table))[i])
    }
    lines(rep(0,ncol(pub.table)))
    hyear <- h_by_year(pub.table) 
    text(ncol(pub.table)*0.95,max(colSums(pub.table,na.rm = TRUE))*0.95,
         paste("H =",hyear[length(hyear)]))
}

pub.table <- get_yearly_publications(ID)
hyear <- h_by_year(pub.table) 
plot_hyear_full(pub.table)
plot(names(hyear),
     hyear,
     type="b",
     xlab="Year",
     ylab = "H index")

rplot rplot01

jefferis commented 5 years ago

Hi @TS404, late response, but these plots are certainly nice. One option might have been to contribute a vignette with these as examples. They do need a bit of cleanup. In particular there seem to be multiple calls to get_publications inside the for loop in your first function.

TS404 commented 5 years ago

Yes, I was unable to work out a way to avoid the loop (which is by fast the slowest part and also risks triggering throttling on the server side). Any ideas? I'd be happy to help put together a vignette once the code is a bit tighter.

jefferis commented 5 years ago

My mistake — I misread — there’s only a single call to get_publications and the multiple calls to get_article_cite_history are inevitable.