Open XiaoranYan opened 4 years ago
Good morning Xiaoran
Thanks for the update. Unfortunately I don’t have a list of these journals with eISSN. We’ve discussed in house and since you expect updated data from WoS, we can just get the data when it arrives in March. So please hold off on extracting the MAG data, and send us an update when the WoS data is available.
Kind regards
Willa
From: Yan, Xiaoran yan30@iu.edu Sent: Saturday, February 15, 2020 11:12 AM To: Tavernier, Willa wtavern@iu.edu; Hare, Sarah Elaine scrissin@iu.edu; Wittenberg, Jamie Viva jvwitten@indiana.edu Subject: Re: CADRE data
Unfortunately, there is no acknowledgment information in MAG. I also did a ISSN match, out of the 2293 journals in your list, 304 found no match in MAG. I found MAG sometimes uses eISSN and ISSN interchangeably, do you have eISSN numbers for Elsevier journals?
By the way, we just got an update from clarivate that 2019 WoS update will be delivered in early March. If that is not too late, we will be able to repeat what we did before.
Xiaoran
On 2/14/20 3:38 PM, Tavernier, Willa wrote:
Hi Xiaoran
Do you have any funding data that can be extracted e.g. if there is a grant acknowledgment? If we can get that in addition to the fields you have listed below, that would be ideal.
Best,
Willa
From: Yan, Xiaoran <yan30@iu.edu>
Sent: Thursday, February 13, 2020 1:07 PM
To: Tavernier, Willa <wtavern@iu.edu>; Hare, Sarah Elaine <scrissin@iu.edu>; Wittenberg, Jamie Viva <jvwitten@indiana.edu>
Subject: RE: CADRE data
Hi Willa,
Since the data schema is different from MAG, I will not be able to provide the exact same fields we did last time,
https://github.com/iuni-cadre/Collaborative-projects/issues/6
Would something like this work for you?
• MAG ID
• Publication type
• eISSN/ISSN
• Title
• Author
• Author Affiliation
• Publication Name
• Publisher
• Year Published
• DOI
One of the missing attributes is corresponding author. MAG does not have information on that. You will have to approximate it using first/last authors or distribute the cost across all authors.
Let me know if this works. I will get our cluster up as soon as you confirm.
Thanks!
Xiaoran
From: Tavernier, Willa <wtavern@iu.edu>
Sent: Tuesday, February 11, 2020 9:29 AM
To: Hare, Sarah Elaine <scrissin@iu.edu>; Yan, Xiaoran <yan30@iu.edu>; Wittenberg, Jamie Viva <jvwitten@indiana.edu>
Cc: Hutchinson, Matthew Alexander <maahutch@iu.edu>
Subject: RE: CADRE data
Good morning
The Elsevier Journal list that we used for the 2017 analysis. This was downloaded in April 2019. I have also attached a list that I downloaded today. I think it would be best to use the February 2020 list as we are trying to calculate potential future costs.
Kind regards
Willa
Willa Tavernier
Open Scholarship Resident Librarian
Scholarly Communication Department, Research and Learning Services
Herman B Wells Library E264B
1320 E. 10th Street, Bloomington, IN 47405-3970
wtavern@iu.edu | 812-856-1122
My ORCID https://orcid.org/0000-0002-0637-6258
From: Hare, Sarah Elaine <scrissin@iu.edu>
Sent: Tuesday, February 11, 2020 9:21 AM
To: Yan, Xiaoran <yan30@iu.edu>; Wittenberg, Jamie Viva <jvwitten@indiana.edu>
Cc: Hutchinson, Matthew Alexander <maahutch@iu.edu>; Tavernier, Willa <wtavern@iu.edu>
Subject: Re: CADRE data
Hi Xiaron,
Thanks for the quick response! I’m looping in my colleague Willa has who overseen the work with CADRE data in the past. Willa, could you provide a list of the Elsevier journals we used for the 2017 analysis? Or do you feel that we should stick with WoS data so we only need the 2018 data at this point?
Best,
Sarah
From: "Yan, Xiaoran" <yan30@iu.edu>
Date: Monday, February 10, 2020 at 6:14 PM
To: "Wittenberg, Jamie Viva" <jvwitten@indiana.edu>
Cc: "Hare, Sarah Elaine" <scrissin@iu.edu>, "Hutchinson, Matthew Alexander" <maahutch@iu.edu>
Subject: Re: CADRE data
Hi Jamie and Sarah,
We can certainly help Sarah get access. The problem is that we have yet to receive our 2019 update for Web of Science.
We can try Microsoft Academic Graph which we have latest data, but would it would require a list of Elsevier journal names.
Thanks!
Xiaoran
Hi Willa,
With UITS VPN back online, I can now connect with the server. Both data sets should be ready by next Tuesday.
Thanks,
Xiaoran
On 3/13/20 2:56 PM, Tavernier, Willa wrote:
Hi Xiaoran
I'm touching base on the data request because I will be telecommuting for part/all of each day beginning next week. To be able to organize my work schedule effectively, can you give me an estimate for when each of the 2018 and 2019 data extract will be ready? This is a priority item for me.
I know things are in constant flux right now, and I appreciate your work on this.
Best
-Willa
Willa Tavernier Open Scholarship Resident Librarian Herman B Wells Library 350 1320 E. 10th Street, Bloomington, IN 47405-3970 wtavern@iu.edu | 812-856-1122
My ORCID https://orcid.org/0000-0002-0637-6258 From: Yan, Xiaoran Sent: Thursday, March 5, 2020 3:24 PM To: Tavernier, Willa; Hare, Sarah Elaine; Wittenberg, Jamie Viva Subject: Re: CADRE data
Hi Willa,
The new data has arrived. The hard drive is actually delivered to the Well's library, but we have yet to pick it up yet. I shall be able to to start working with it next week.
Thanks!
Xiaoran
Thank you Xiaoran
I will start looking at this now.
Best -Willa
Willa Tavernier Open Scholarship Resident Librarian Herman B Wells Library 350 1320 E. 10th Street, Bloomington, IN 47405-3970 wtavern@iu.edu | 812-856-1122
My ORCID https://orcid.org/0000-0002-0637-6258
On Mar 18, 2020, at 1:48 AM, Yan, Xiaoran yan30@iu.edu wrote:
Hi Willa,
The new data is ready, you can download it from
https://github.com/iuni-cadre/Collaborative-projects/blob/master/BTAA%20queries/wos1819papersIU.csv
The CSV follows a similar structure compared to last years extraction for BTAA, with the following headers:
"WoSid","PT","PY","EI","SN","DI","TI","SO","PU","FX","RP","AU","C1"
Or in English:
"UID","_pubtype","_pubyear","eissn","issn","doi","title","journal","publisher","funding_text","reprintFlag","author_name","author_addresses"
Among the columns, the AU and C1 are nested according to author order, and RP indicates the corresponding author’s position. Since the data is small enough, I have included all 2018, 2019 papers with any author affiliated with IU, from all publishers.
Let me know if you have any question about the data.
Thanks!
Xiaoran
Hi WIlla,
I am doing well. I wish you the same.
In some rows the value is 0. Does that mean that the information is not available? This is correct. Not all papers has RP information. Sorry for the confusion.
Thanks and stay healthy!
Xiaoran On 4/21/20 3:40 PM, Tavernier, Willa wrote:
Hi Xiaoran
I hope you are doing ok.
I have a question about the RP field in the dataset. In some rows the value is 0. Does that mean that the information is not available? Or that the author names start from position 0?
In relation to the second possibility, I have seen other rows where, for example, the RP value is 3 but there are only three names, which suggests that the author names ought to start from position 1 rather than position 0.
I’m happy to chat further about this at your convenience- I can still receive calls on my work line via the Skype for business app - 812-856-1122.
Best, Willa
Willa Tavernier Open Scholarship Resident Librarian
Hi Willa,
If I remember correctly, the 2017 extract includes all BTAA universities while 2018-2019 update only contains IU.
Also the previous extract contains all years 1900-2017.
Xiaoran
On 5/13/20 3:53 PM, Tavernier, Willa wrote:
Hi Xiaoran
I have a quick question (again!). In my csv I pulled down from github I have:
16,678 records 2018 - 8399 records 2019 - 8065 records
I am wondering if that is all the data or if it got truncated in any way. In the 2017 extract we analyzed last year there were over 44,000 records so I want to make sure.
I hope you are well!
Willa Tavernier Open Scholarship Resident Librarian Herman B Wells Library 350 1320 E. 10th Street, Bloomington, IN 47405-3970 wtavern@iu.edu | 812-856-1122
Hi Xiaoran
With regard to my query below - the email address of the corresponding author would also work for my purposes if that is easier to retrieve.
Willa Tavernier Open Scholarship Resident Librarian Herman B Wells Library 350 1320 E. 10th Street, Bloomington, IN 47405-3970 wtavern@iu.edu | 812-856-1122
My ORCID https://orcid.org/0000-0002-0637-6258 From: Tavernier, Willa Sent: Monday, May 18, 2020 7:23 PM To: Yan, Xiaoran Subject: RE: CADRE data
Hi Xiaoran
Thanks for your time this morning. I’ve worked on this data today but unfortunately I can’t surmount the issue of finding the address of the corresponding author.
For a significant amount of the data, because more than one author may be from the same institution, the script returns no result. For example there may be 7 authors from 4 institutions, and the RP flag is 5. Because only 4 institutions are listed there is no result for the script to return.
Is there a way to query CADRE to return the name and address of the corresponding author?
This information is key to analyzing potential spend for Article Publishing Charges by IU authors, as the analysis assumes that the corresponding author would take primary responsibility for this payment if the article is an Open Access Article. Therefore my script uses the RP flag integer to search the other columns to find the corresponding author and the address of the corresponding author. Where that address is an IU address it is pulled into the analysis, otherwise it is rejected. Because of this I can’t move forward with the analysis if I am unable to retrieve the corresponding author’s address.
Best regards,
Willa
Hi Willa,
Sorry for the delay. I have 2 grant deadlines coming up. I might have some time tomorrow but not sure how much I can help with your ACRL paper at this point.
The new data is ready, you can download it from
https://github.com/iuni-cadre/Collaborative-projects/blob/master/BTAA%20queries/wos1819IU-RP.csv
The CSV now has additional columns that directly pulls from the reprint addresses
RPcount,RPnames,RP,RPflags,AU-C1-map
Or in English:
"number of reprint authors","reprint author last names","Reprint addresses","Reprint flags for each AU","AU to C1 mapping"
All other columns remains the same. I discovered that the problem was there could be multiple reprint authors for each paper and multiple addresses for each author. The new columns should now helps to address the old problems.
Let me know if you have any question about the updated data.
Thanks!
--
Xiaoran Yan
Good morning
The Elsevier Journal list that we used for the 2017 analysis. This was downloaded in April 2019. I have also attached a list that I downloaded today. I think it would be best to use the February 2020 list as we are trying to calculate potential future costs.
Kind regards
Willa
Willa Tavernier
Open Scholarship Resident Librarian
Scholarly Communication Department, Research and Learning Services