IU 2018/2919 data extraction for journal cost analysis

XiaoranYan commented 4 years ago

Good morning

The Elsevier Journal list that we used for the 2017 analysis. This was downloaded in April 2019. I have also attached a list that I downloaded today. I think it would be best to use the February 2020 list as we are trying to calculate potential future costs.

Kind regards

Willa

Willa Tavernier

Open Scholarship Resident Librarian

Scholarly Communication Department, Research and Learning Services

XiaoranYan commented 4 years ago

Good morning Xiaoran

Thanks for the update. Unfortunately I don’t have a list of these journals with eISSN. We’ve discussed in house and since you expect updated data from WoS, we can just get the data when it arrives in March. So please hold off on extracting the MAG data, and send us an update when the WoS data is available.

Kind regards

Willa

From: Yan, Xiaoran yan30@iu.edu Sent: Saturday, February 15, 2020 11:12 AM To: Tavernier, Willa wtavern@iu.edu; Hare, Sarah Elaine scrissin@iu.edu; Wittenberg, Jamie Viva jvwitten@indiana.edu Subject: Re: CADRE data

Unfortunately, there is no acknowledgment information in MAG. I also did a ISSN match, out of the 2293 journals in your list, 304 found no match in MAG. I found MAG sometimes uses eISSN and ISSN interchangeably, do you have eISSN numbers for Elsevier journals?

By the way, we just got an update from clarivate that 2019 WoS update will be delivered in early March. If that is not too late, we will be able to repeat what we did before.

Xiaoran

On 2/14/20 3:38 PM, Tavernier, Willa wrote:

Hi Xiaoran

Do you have any funding data that can be extracted e.g. if there is a grant acknowledgment?  If we can get that in addition to the fields you have listed below, that would be ideal.

Best,

Willa

From: Yan, Xiaoran <yan30@iu.edu>
Sent: Thursday, February 13, 2020 1:07 PM
To: Tavernier, Willa <wtavern@iu.edu>; Hare, Sarah Elaine <scrissin@iu.edu>; Wittenberg, Jamie Viva <jvwitten@indiana.edu>
Subject: RE: CADRE data

Hi Willa,

Since the data schema is different from MAG, I will not be able to provide the exact same fields we did last time,

https://github.com/iuni-cadre/Collaborative-projects/issues/6

Would something like this work for you?

• MAG ID
• Publication type
• eISSN/ISSN
• Title
• Author
• Author Affiliation
• Publication Name
• Publisher
• Year Published
• DOI

One of the missing attributes is corresponding author. MAG does not have information on that. You will have to approximate it using first/last authors or distribute the cost across all authors.

Let me know if this works. I will get our cluster up as soon as you confirm.

Thanks!

Xiaoran

From: Tavernier, Willa <wtavern@iu.edu>
Sent: Tuesday, February 11, 2020 9:29 AM
To: Hare, Sarah Elaine <scrissin@iu.edu>; Yan, Xiaoran <yan30@iu.edu>; Wittenberg, Jamie Viva <jvwitten@indiana.edu>
Cc: Hutchinson, Matthew Alexander <maahutch@iu.edu>
Subject: RE: CADRE data

Good morning

The Elsevier Journal list that we used for the 2017 analysis.  This was downloaded in April 2019.  I have also attached a list that I downloaded today.  I think it would be best to use the February 2020 list as we are trying to calculate potential future costs.

Kind regards

Willa

Willa Tavernier

Open Scholarship Resident Librarian

Scholarly Communication Department, Research and Learning Services

Herman B Wells Library E264B

1320 E. 10th Street, Bloomington, IN 47405-3970

wtavern@iu.edu | 812-856-1122

My ORCID https://orcid.org/0000-0002-0637-6258

From: Hare, Sarah Elaine <scrissin@iu.edu>
Sent: Tuesday, February 11, 2020 9:21 AM
To: Yan, Xiaoran <yan30@iu.edu>; Wittenberg, Jamie Viva <jvwitten@indiana.edu>
Cc: Hutchinson, Matthew Alexander <maahutch@iu.edu>; Tavernier, Willa <wtavern@iu.edu>
Subject: Re: CADRE data

Hi Xiaron,

Thanks for the quick response! I’m looping in my colleague Willa has who overseen the work with CADRE data in the past. Willa, could you provide a list of the Elsevier journals we used for the 2017 analysis? Or do you feel that we should stick with WoS data so we only need the 2018 data at this point?

Best,

Sarah

From: "Yan, Xiaoran" <yan30@iu.edu>
Date: Monday, February 10, 2020 at 6:14 PM
To: "Wittenberg, Jamie Viva" <jvwitten@indiana.edu>
Cc: "Hare, Sarah Elaine" <scrissin@iu.edu>, "Hutchinson, Matthew Alexander" <maahutch@iu.edu>
Subject: Re: CADRE data

Hi Jamie and Sarah,

We can certainly help Sarah get access. The problem is that we have yet to receive our 2019 update for Web of Science.

We can try Microsoft Academic Graph which we have latest data, but would it would require a list of Elsevier journal names.

Thanks!

Xiaoran

XiaoranYan commented 4 years ago

Hi Willa,

With UITS VPN back online, I can now connect with the server. Both data sets should be ready by next Tuesday.

Thanks,

Xiaoran

On 3/13/20 2:56 PM, Tavernier, Willa wrote:

Hi Xiaoran

I'm touching base on the data request because I will be telecommuting for part/all of each day beginning next week. To be able to organize my work schedule effectively, can you give me an estimate for when each of the 2018 and 2019 data extract will be ready? This is a priority item for me.

I know things are in constant flux right now, and I appreciate your work on this.

Best

-Willa

Willa Tavernier Open Scholarship Resident Librarian Herman B Wells Library 350 1320 E. 10th Street, Bloomington, IN 47405-3970 wtavern@iu.edu | 812-856-1122

My ORCID https://orcid.org/0000-0002-0637-6258 From: Yan, Xiaoran Sent: Thursday, March 5, 2020 3:24 PM To: Tavernier, Willa; Hare, Sarah Elaine; Wittenberg, Jamie Viva Subject: Re: CADRE data

Hi Willa,

The new data has arrived. The hard drive is actually delivered to the Well's library, but we have yet to pick it up yet. I shall be able to to start working with it next week.

Thanks!

Xiaoran

XiaoranYan commented 4 years ago

Thank you Xiaoran

I will start looking at this now.

Best -Willa

Willa Tavernier Open Scholarship Resident Librarian Herman B Wells Library 350 1320 E. 10th Street, Bloomington, IN 47405-3970 wtavern@iu.edu | 812-856-1122

My ORCID https://orcid.org/0000-0002-0637-6258

On Mar 18, 2020, at 1:48 AM, Yan, Xiaoran yan30@iu.edu wrote:

Hi Willa,

The new data is ready, you can download it from

https://github.com/iuni-cadre/Collaborative-projects/blob/master/BTAA%20queries/wos1819papersIU.csv

The CSV follows a similar structure compared to last years extraction for BTAA, with the following headers:

"WoSid","PT","PY","EI","SN","DI","TI","SO","PU","FX","RP","AU","C1"

Or in English:

"UID","_pubtype","_pubyear","eissn","issn","doi","title","journal","publisher","funding_text","reprintFlag","author_name","author_addresses"

Among the columns, the AU and C1 are nested according to author order, and RP indicates the corresponding author’s position. Since the data is small enough, I have included all 2018, 2019 papers with any author affiliated with IU, from all publishers.

Let me know if you have any question about the data.

Thanks!

Xiaoran

XiaoranYan commented 4 years ago

Hi WIlla,

I am doing well. I wish you the same.

In some rows the value is 0. Does that mean that the information is not available? This is correct. Not all papers has RP information. Sorry for the confusion.

Thanks and stay healthy!

Xiaoran On 4/21/20 3:40 PM, Tavernier, Willa wrote:

Hi Xiaoran

I hope you are doing ok.

I have a question about the RP field in the dataset. In some rows the value is 0. Does that mean that the information is not available? Or that the author names start from position 0?

In relation to the second possibility, I have seen other rows where, for example, the RP value is 3 but there are only three names, which suggests that the author names ought to start from position 1 rather than position 0.

I’m happy to chat further about this at your convenience- I can still receive calls on my work line via the Skype for business app - 812-856-1122.

Best, Willa

Willa Tavernier Open Scholarship Resident Librarian

XiaoranYan commented 4 years ago

Hi Willa,

If I remember correctly, the 2017 extract includes all BTAA universities while 2018-2019 update only contains IU.

Also the previous extract contains all years 1900-2017.

Xiaoran

On 5/13/20 3:53 PM, Tavernier, Willa wrote:

Hi Xiaoran

I have a quick question (again!). In my csv I pulled down from github I have:
16,678 records

2018 - 8399 records

2019 - 8065 records
I am wondering if that is all the data or if it got truncated in any way. In the 2017 extract we analyzed last year there were over 44,000 records so I want to make sure.

I hope you are well!

Willa Tavernier Open Scholarship Resident Librarian Herman B Wells Library 350 1320 E. 10th Street, Bloomington, IN 47405-3970 wtavern@iu.edu | 812-856-1122

My ORCID https://orcid.org/0000-0002-0637-6258

XiaoranYan commented 4 years ago

Hi Xiaoran

With regard to my query below - the email address of the corresponding author would also work for my purposes if that is easier to retrieve.

Willa Tavernier Open Scholarship Resident Librarian Herman B Wells Library 350 1320 E. 10th Street, Bloomington, IN 47405-3970 wtavern@iu.edu | 812-856-1122

My ORCID https://orcid.org/0000-0002-0637-6258 From: Tavernier, Willa Sent: Monday, May 18, 2020 7:23 PM To: Yan, Xiaoran Subject: RE: CADRE data

Hi Xiaoran

Thanks for your time this morning. I’ve worked on this data today but unfortunately I can’t surmount the issue of finding the address of the corresponding author.

For a significant amount of the data, because more than one author may be from the same institution, the script returns no result. For example there may be 7 authors from 4 institutions, and the RP flag is 5. Because only 4 institutions are listed there is no result for the script to return.

Is there a way to query CADRE to return the name and address of the corresponding author?

This information is key to analyzing potential spend for Article Publishing Charges by IU authors, as the analysis assumes that the corresponding author would take primary responsibility for this payment if the article is an Open Access Article. Therefore my script uses the RP flag integer to search the other columns to find the corresponding author and the address of the corresponding author. Where that address is an IU address it is pulled into the analysis, otherwise it is rejected. Because of this I can’t move forward with the analysis if I am unable to retrieve the corresponding author’s address.

Best regards,

Willa

XiaoranYan commented 4 years ago

Hi Willa,

Sorry for the delay. I have 2 grant deadlines coming up. I might have some time tomorrow but not sure how much I can help with your ACRL paper at this point.

The new data is ready, you can download it from

https://github.com/iuni-cadre/Collaborative-projects/blob/master/BTAA%20queries/wos1819IU-RP.csv

The CSV now has additional columns that directly pulls from the reprint addresses

RPcount,RPnames,RP,RPflags,AU-C1-map

Or in English:

"number of reprint authors","reprint author last names","Reprint addresses","Reprint flags for each AU","AU to C1 mapping"

All other columns remains the same. I discovered that the problem was there could be multiple reprint authors for each paper and multiple addresses for each author. The new columns should now helps to address the old problems.

Let me know if you have any question about the updated data.

Thanks!

--

Xiaoran Yan

iuni-cadre / Collaborative-projects

IU 2018/2919 data extraction for journal cost analysis #12