rvidgen / clr

6 stars 5 forks source link

Erro in Load data files CLR #4

Open vprincipe opened 6 years ago

vprincipe commented 6 years ago

Set parameters

read_data <- "scopus.csv" data_source <- "Scopus"

Load data files

articles_df <- getArticles(files_path = read_data, data_source = data_source)

Error in seq.default(1, nrow(articles_df), 1) : wrong sign in 'by' argument In addition: Warning messages: 1: Unknown or uninitialised column: 'Year'. 2: Unknown or uninitialised column: 'NumberCites'. 3: Unknown or uninitialised column: 'NumberCites'.

rvidgen commented 6 years ago

Hi, I think the file might be in the wrong format - can you post the scopus data file please?

vprincipe commented 6 years ago

The .csv file I download from Scopus to test the code. If the problem in the file, what is the best way to download Scopus file?

scopus.csv.zip

rvidgen commented 6 years ago

Hi I loaded your file fine, as below. It might be to do with the file encoding and OS – I’ll check with a colleague Regards Richard

Load data files

articles_df <- getArticles(files_path = read_data,

  • data_source = data_source) 1 data files. 2000 articles, spanning 2013 to 2018 Total of 11634 citations across 572 journals. nrow(articles_df) [1] 2000 names(articles_df) [1] "Id" "Authors" "Title" [4] "Year" "SourceTitle" "Volume" [7] "Issue" "ArtNo" "PageStart" [10] "PageEnd" "PageCount" "NumberCites" [13] "DOI" "Link" "Affiliations" [16] "AuthorsWithAffiliations" "Abstract" "AuthorKeywords" [19] "DocumentType" "Source" "EID"

From: vprincipe notifications@github.com<mailto:notifications@github.com> Reply-To: rvidgen/clr reply@reply.github.com<mailto:reply@reply.github.com> Date: Saturday, 6 October 2018 at 13:29 To: rvidgen/clr clr@noreply.github.com<mailto:clr@noreply.github.com> Cc: Richard Vidgen richard@vidgen.com<mailto:richard@vidgen.com>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [rvidgen/clr] Erro in Load data files CLR (#4)

The .csv file I download from Scopus to test the code. If the problem in the file, what is the best way to download Scopus file?

scopus.csv.ziphttps://github.com/rvidgen/clr/files/2453027/scopus.csv.zip

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rvidgen/clr/issues/4#issuecomment-427569943, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMw-AxMpo1aYruovhuReAvHsTP0biEXks5uiKIqgaJpZM4XK3q2.

vprincipe commented 6 years ago

Hello,

I use the macOS Hig Sierra, this possible to change the structure with .csv

Bellow, I paste the csv structure .... in my computer use , to separate the variables.

Authors,Author Ids,Title,Year,Source title,Volume,Issue,Art. No.,Page start,Page end,Page count,Cited by,DOI,Link,Affiliations,Authors with affiliations,Abstract,Author Keywords,Index Keywords,Molecular Sequence Numbers,Chemicals/CAS,Tradenames,Manufacturers,Funding Details,Funding Text 1,Funding Text 2,Funding Text 3,References,Correspondence Address,Editors,Sponsors,Publisher,Conference name,Conference date,Conference location,Conference code,ISSN,ISBN,CODEN,PubMed ID,Language of Original Document,Abbreviated Source Title,Document Type,Access Type,Source,EID "Schreyer D., Däuper D.","56925104700;57201095839;","Determinants of spectator no-show behaviour: first empirical evidence from the German Bundesliga",2018,"Applied Economics Letters","25","21",,"1475","1480",,,"10.1080/13504851.2018.1430314","https://www.scopus.com/inward/record.uri?eid=2-s2.0-85043365875&doi=10.1080%2f13504851.2018.1430314&partnerID=40&md5=7442d062cb9efd4af88cf488488ec67d","Center for Sports and Management (CSM), WHU - Otto Beisheim School of Management, Düsseldorf, Germany; DFL Deutsche Fußball Liga GmbH, Frankfurt am Main, Germany","Schreyer, D., Center for Sports and Management (CSM), WHU - Otto Beisheim School of Management, Düsseldorf, Germany; Däuper, D., DFL Deutsche Fußball Liga GmbH, Frankfurt am Main, Germany","The analysis of stadium attendance demand has a long tradition in the economic literature. However, despite its evident merits, this previous research has been critiqued at several levels, in particular for relying on a suboptimal demand proxy, i.e. published attendance data.

rvidgen commented 6 years ago

The problem is to do with the language settings of the operating environment. It’s not an easy fix and we are thinking about what we can do to make it work with different language settings. It seems to be related to the file encoding. I’ll look some more into it. In the meantime, try this to see what R sees when it reads the data:

scopus <- read.csv("scopus 2.csv", stringsAsFactors = FALSE) nrow(scopus) names(scopus)

I get what I expect:

nrow(scopus) [1] 2000 names(scopus) [1] "Authors" "Author.Ids" [3] "Title" "Year" [5] "Source.title" "Volume" [7] "Issue" "Art..No." [9] "Page.start" "Page.end" [11] "Page.count" "Cited.by" [13] "DOI" "Link" [15] "Affiliations" "Authors.with.affiliations" [17] "Abstract" "Author.Keywords" [19] "Index.Keywords" "Molecular.Sequence.Numbers" [21] "Chemicals.CAS" "Tradenames" [23] "Manufacturers" "Funding.Details" [25] "Funding.Text.1" "Funding.Text.2" [27] "Funding.Text.3" "References" [29] "Correspondence.Address" "Editors" [31] "Sponsors" "Publisher" [33] "Conference.name" "Conference.date" [35] "Conference.location" "Conference.code" [37] "ISSN" "ISBN" [39] "CODEN" "PubMed.ID" [41] "Language.of.Original.Document" "Abbreviated.Source.Title" [43] "Document.Type" "Access.Type" [45] "Source" "EID"


Richard Vidgen Professor of Business Analytics UNSW Business School, University of New South Wales, Australia Professor of Business Analytics School of Business, Economics and Informatics, Birkbeck University, UK Professor Emeritus of Systems Thinking University of Hull Business School, UK E: r.vidgen@unsw.edu.aumailto:r.vidgen@unsw.edu.au W: https://www.business.unsw.edu.au/our-people/richardvidgen B: http://datasciencebusiness.wordpress.com/

From: vprincipe notifications@github.com<mailto:notifications@github.com> Reply-To: rvidgen/clr reply@reply.github.com<mailto:reply@reply.github.com> Date: Saturday, 6 October 2018 at 13:29 To: rvidgen/clr clr@noreply.github.com<mailto:clr@noreply.github.com> Cc: Richard Vidgen richard@vidgen.com<mailto:richard@vidgen.com>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [rvidgen/clr] Erro in Load data files CLR (#4)

The .csv file I download from Scopus to test the code. If the problem in the file, what is the best way to download Scopus file?

scopus.csv.ziphttps://github.com/rvidgen/clr/files/2453027/scopus.csv.zip

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rvidgen/clr/issues/4#issuecomment-427569943, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMw-AxMpo1aYruovhuReAvHsTP0biEXks5uiKIqgaJpZM4XK3q2.

vprincipe commented 6 years ago

This problem is complicated, I design the same idea in Python code but I very interesting use your R code too.

let's go...

scopus <- read.csv("scopus.csv", stringsAsFactors = FALSE) read perfect

but I need to loading data

articles_df <- getArticles(files_path = scopus, data_source = data_source)

the problem is in function getArticles ( getArticles.R ), I need to jump that function to work but it's difficult to me understand the output if I use articles_df, it's not the same to Scopus output.

If I use:

articles_df <- read.csv("scopus.csv", stringsAsFactors = FALSE)

impact <- impactAnalysis(articles_df = articles_df) Error in grouped_df_impl(data, unname(vars), drop) : Column SourceTitle is unknown

rvidgen commented 6 years ago

Yes, it is proving hard to fix the language issue, we are looking at building an online version that does not require R to be installed.

You could manually code it up so that the columns in the Scopus download are renamed to the following:

[1] "Id" "Authors" "Title" [4] "Year" "SourceTitle" "Volume" [7] "Issue" "ArtNo" "PageStart" [10] "PageEnd" "PageCount" "NumberCites" [13] "DOI" "Link" "Affiliations" [16] "AuthorsWithAffiliations" "Abstract" "AuthorKeywords" [19] "DocumentType" "Source" "EID"

Horrible, I know Richard

From: vprincipe notifications@github.com<mailto:notifications@github.com> Reply-To: rvidgen/clr reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, 8 October 2018 at 12:55 To: rvidgen/clr clr@noreply.github.com<mailto:clr@noreply.github.com> Cc: Richard Vidgen richard@vidgen.com<mailto:richard@vidgen.com>, Comment comment@noreply.github.com<mailto:comment@noreply.github.com> Subject: Re: [rvidgen/clr] Erro in Load data files CLR (#4)

This problem is complicated, I design the same idea in Python code but I very interesting use your R code too.

let's go...

scopus <- read.csv("scopus.csv", stringsAsFactors = FALSE) read perfect

but I need to loading data

articles_df <- getArticles(files_path = scopus, data_source = data_source)

the problem is in function getArticles ( getArticles.R ), I need to jump that function to work but it's difficult to me understand the output if I use articles_df, it's not the same to Scopus output.

If I use:

articles_df <- read.csv("scopus.csv", stringsAsFactors = FALSE)

impact <- impactAnalysis(articles_df = articles_df) Error in grouped_df_impl(data, unname(vars), drop) : Column SourceTitle is unknown

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rvidgen/clr/issues/4#issuecomment-427805902, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOMw-HXJ5uwER9M4eNohk68zkfzKNZMCks5uiz1BgaJpZM4XK3q2.