Open vprincipe opened 6 years ago
Hi, I think the file might be in the wrong format - can you post the scopus data file please?
The .csv file I download from Scopus to test the code. If the problem in the file, what is the best way to download Scopus file?
Hi I loaded your file fine, as below. It might be to do with the file encoding and OS – I’ll check with a colleague Regards Richard
Load data files
articles_df <- getArticles(files_path = read_data,
- data_source = data_source) 1 data files. 2000 articles, spanning 2013 to 2018 Total of 11634 citations across 572 journals. nrow(articles_df) [1] 2000 names(articles_df) [1] "Id" "Authors" "Title" [4] "Year" "SourceTitle" "Volume" [7] "Issue" "ArtNo" "PageStart" [10] "PageEnd" "PageCount" "NumberCites" [13] "DOI" "Link" "Affiliations" [16] "AuthorsWithAffiliations" "Abstract" "AuthorKeywords" [19] "DocumentType" "Source" "EID"
From: vprincipe notifications@github.com<mailto:notifications@github.com> Reply-To: rvidgen/clr reply@reply.github.com<mailto:reply@reply.github.com> Date: Saturday, 6 October 2018 at 13:29 To: rvidgen/clr clr@noreply.github.com<mailto:clr@noreply.github.com> Cc: Richard Vidgen richard@vidgen.com<mailto:richard@vidgen.com>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [rvidgen/clr] Erro in Load data files CLR (#4)
The .csv file I download from Scopus to test the code. If the problem in the file, what is the best way to download Scopus file?
scopus.csv.ziphttps://github.com/rvidgen/clr/files/2453027/scopus.csv.zip
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rvidgen/clr/issues/4#issuecomment-427569943, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMw-AxMpo1aYruovhuReAvHsTP0biEXks5uiKIqgaJpZM4XK3q2.
Hello,
I use the macOS Hig Sierra, this possible to change the structure with .csv
Bellow, I paste the csv structure .... in my computer use , to separate the variables.
Authors,Author Ids,Title,Year,Source title,Volume,Issue,Art. No.,Page start,Page end,Page count,Cited by,DOI,Link,Affiliations,Authors with affiliations,Abstract,Author Keywords,Index Keywords,Molecular Sequence Numbers,Chemicals/CAS,Tradenames,Manufacturers,Funding Details,Funding Text 1,Funding Text 2,Funding Text 3,References,Correspondence Address,Editors,Sponsors,Publisher,Conference name,Conference date,Conference location,Conference code,ISSN,ISBN,CODEN,PubMed ID,Language of Original Document,Abbreviated Source Title,Document Type,Access Type,Source,EID "Schreyer D., Däuper D.","56925104700;57201095839;","Determinants of spectator no-show behaviour: first empirical evidence from the German Bundesliga",2018,"Applied Economics Letters","25","21",,"1475","1480",,,"10.1080/13504851.2018.1430314","https://www.scopus.com/inward/record.uri?eid=2-s2.0-85043365875&doi=10.1080%2f13504851.2018.1430314&partnerID=40&md5=7442d062cb9efd4af88cf488488ec67d","Center for Sports and Management (CSM), WHU - Otto Beisheim School of Management, Düsseldorf, Germany; DFL Deutsche Fußball Liga GmbH, Frankfurt am Main, Germany","Schreyer, D., Center for Sports and Management (CSM), WHU - Otto Beisheim School of Management, Düsseldorf, Germany; Däuper, D., DFL Deutsche Fußball Liga GmbH, Frankfurt am Main, Germany","The analysis of stadium attendance demand has a long tradition in the economic literature. However, despite its evident merits, this previous research has been critiqued at several levels, in particular for relying on a suboptimal demand proxy, i.e. published attendance data.
The problem is to do with the language settings of the operating environment. It’s not an easy fix and we are thinking about what we can do to make it work with different language settings. It seems to be related to the file encoding. I’ll look some more into it. In the meantime, try this to see what R sees when it reads the data:
scopus <- read.csv("scopus 2.csv", stringsAsFactors = FALSE) nrow(scopus) names(scopus)
I get what I expect:
nrow(scopus) [1] 2000 names(scopus) [1] "Authors" "Author.Ids" [3] "Title" "Year" [5] "Source.title" "Volume" [7] "Issue" "Art..No." [9] "Page.start" "Page.end" [11] "Page.count" "Cited.by" [13] "DOI" "Link" [15] "Affiliations" "Authors.with.affiliations" [17] "Abstract" "Author.Keywords" [19] "Index.Keywords" "Molecular.Sequence.Numbers" [21] "Chemicals.CAS" "Tradenames" [23] "Manufacturers" "Funding.Details" [25] "Funding.Text.1" "Funding.Text.2" [27] "Funding.Text.3" "References" [29] "Correspondence.Address" "Editors" [31] "Sponsors" "Publisher" [33] "Conference.name" "Conference.date" [35] "Conference.location" "Conference.code" [37] "ISSN" "ISBN" [39] "CODEN" "PubMed.ID" [41] "Language.of.Original.Document" "Abbreviated.Source.Title" [43] "Document.Type" "Access.Type" [45] "Source" "EID"
From: vprincipe notifications@github.com<mailto:notifications@github.com> Reply-To: rvidgen/clr reply@reply.github.com<mailto:reply@reply.github.com> Date: Saturday, 6 October 2018 at 13:29 To: rvidgen/clr clr@noreply.github.com<mailto:clr@noreply.github.com> Cc: Richard Vidgen richard@vidgen.com<mailto:richard@vidgen.com>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [rvidgen/clr] Erro in Load data files CLR (#4)
The .csv file I download from Scopus to test the code. If the problem in the file, what is the best way to download Scopus file?
scopus.csv.ziphttps://github.com/rvidgen/clr/files/2453027/scopus.csv.zip
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rvidgen/clr/issues/4#issuecomment-427569943, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMw-AxMpo1aYruovhuReAvHsTP0biEXks5uiKIqgaJpZM4XK3q2.
This problem is complicated, I design the same idea in Python code but I very interesting use your R code too.
let's go...
scopus <- read.csv("scopus.csv", stringsAsFactors = FALSE) read perfect
but I need to loading data
articles_df <- getArticles(files_path = scopus, data_source = data_source)
the problem is in function getArticles ( getArticles.R ), I need to jump that function to work but it's difficult to me understand the output if I use articles_df, it's not the same to Scopus output.
If I use:
articles_df <- read.csv("scopus.csv", stringsAsFactors = FALSE)
impact <- impactAnalysis(articles_df = articles_df)
Error in grouped_df_impl(data, unname(vars), drop) :
Column SourceTitle
is unknown
Yes, it is proving hard to fix the language issue, we are looking at building an online version that does not require R to be installed.
You could manually code it up so that the columns in the Scopus download are renamed to the following:
[1] "Id" "Authors" "Title" [4] "Year" "SourceTitle" "Volume" [7] "Issue" "ArtNo" "PageStart" [10] "PageEnd" "PageCount" "NumberCites" [13] "DOI" "Link" "Affiliations" [16] "AuthorsWithAffiliations" "Abstract" "AuthorKeywords" [19] "DocumentType" "Source" "EID"
Horrible, I know Richard
From: vprincipe notifications@github.com<mailto:notifications@github.com> Reply-To: rvidgen/clr reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, 8 October 2018 at 12:55 To: rvidgen/clr clr@noreply.github.com<mailto:clr@noreply.github.com> Cc: Richard Vidgen richard@vidgen.com<mailto:richard@vidgen.com>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [rvidgen/clr] Erro in Load data files CLR (#4)
This problem is complicated, I design the same idea in Python code but I very interesting use your R code too.
let's go...
scopus <- read.csv("scopus.csv", stringsAsFactors = FALSE) read perfect
but I need to loading data
articles_df <- getArticles(files_path = scopus, data_source = data_source)
the problem is in function getArticles ( getArticles.R ), I need to jump that function to work but it's difficult to me understand the output if I use articles_df, it's not the same to Scopus output.
If I use:
articles_df <- read.csv("scopus.csv", stringsAsFactors = FALSE)
impact <- impactAnalysis(articles_df = articles_df) Error in grouped_df_impl(data, unname(vars), drop) : Column SourceTitle is unknown
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rvidgen/clr/issues/4#issuecomment-427805902, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMw-HXJ5uwER9M4eNohk68zkfzKNZMCks5uiz1BgaJpZM4XK3q2.
Set parameters
read_data <- "scopus.csv" data_source <- "Scopus"
Load data files
articles_df <- getArticles(files_path = read_data, data_source = data_source)
Error in seq.default(1, nrow(articles_df), 1) : wrong sign in 'by' argument In addition: Warning messages: 1: Unknown or uninitialised column: 'Year'. 2: Unknown or uninitialised column: 'NumberCites'. 3: Unknown or uninitialised column: 'NumberCites'.