Open XiaoranYan opened 5 years ago
Hi Yangyang,
The external citation dataset is ready. I added one new column “cited” to flag if the paper is citing or being cited by the core dataset. You can download it from here
Please check the dataset and let me know if there is any problem.
Thanks!
Xiaoran
Hi Xiaoran,
Sorry for the late reply. As I worked on the dataset, I found there were some problems with our last data.
For the core dataset, it has 178001 records, which sums up to 30G. However, there were many duplications in the attribute "abstract", "keywords", and "refs". The abstracts were duplicated for several times in one record. Same problems were found in the extra dataset. Could you help us to figure out that?
Besides, is it possible for us to get the citing sentences of the core dataset? That could be really helpful.
Thanks!
Yangyang
Hi Yangyang,
Thank you for the feedback! We realize that the current implementation in terms of abstract and keywords are not perfect. We had a similar issue with another user earlier. The root of the problem came from the raw WoS data and we are trying to remedy it in our new CADRE system.
To help us debug, can you specify the particular records that have duplicates in your data? Please give us the WoSid and describe in details what columns are duplicated.
Thanks! Xiaoran
Hi Xiaoran,
I randomly examined some records in the dataset, they have the same duplicated problem.
For the core dataset, the duplicated attributes are "abstract", "keywords", and "refs". They data is duplicated if it is not NULL. Here is an example where we can see the duplicates: (WOS:000257343900020)
"Jones, Philip C.|Ohlmann, Jeffrey W.","Univ Iowa, Dept Management Sci, Iowa City, IA 52242 USA|Univ Iowa, Dept Management Sci, Iowa City, IA 52242 USA","USA|USA","EUROPEAN JOURNAL OF OPERATIONAL RESEARCH","WOS:000257343900020","10.1016/j.ejor.2007.08.033","Journal","Long-range timber supply planning for a vertically integrated paper mill","2008","[1]We consider a vertically integrated papermaking operation composed of an integrated pulp and paper mill with its regional supply network. Considering land procurement and harvest rotation as strategic decision variables, we construct a model to establish a long-range timber supply plan that minimizes the total discounted cost of meeting annual virgin wood fiber demand over an infinite horizon. Under appropriate assumptions on costs and storage, the land procurement and harvest rotation decisions are separable with harvest rotation being determined via a forest economics-type equation and land procurement being determined by a newsvendor-type equation. Published by Elsevier B.V.;[1]We consider a vertically integrated papermaking operation composed of an integrated pulp and paper mill with its regional supply network. Considering land procurement and harvest rotation as strategic decision variables, we construct a model to establish a long-range timber supply plan that minimizes the total discounted cost of meeting annual virgin wood fiber demand over an infinite horizon. Under appropriate assumptions on costs and storage, the land procurement and harvest rotation decisions are separable with harvest rotation being determined via a forest economics-type equation and land procurement being determined by a newsvendor-type equation. Published by Elsevier B.V.;[1]We consider a vertically integrated papermaking operation composed of an integrated pulp and paper mill with its regional supply network. Considering land procurement and harvest rotation as strategic decision variables, we construct a model to establish a long-range timber supply plan that minimizes the total discounted cost of meeting annual virgin wood fiber demand over an infinite horizon. Under appropriate assumptions on costs and storage, the land procurement and harvest rotation decisions are separable with harvest rotation being determined via a forest economics-type equation and land procurement being determined by a newsvendor-type equation. Published by Elsevier B.V.;[1]We consider a vertically integrated papermaking operation composed of an integrated pulp and paper mill with its regional supply network. Considering land procurement and harvest rotation as strategic decision variables, we construct a model to establish a long-range timber supply plan that minimizes the total discounted cost of meeting annual virgin wood fiber demand over an infinite horizon. Under appropriate assumptions on costs and storage, the land procurement and harvest rotation decisions are separable with harvest rotation being determined via a forest economics-type equation and land procurement being determined by a newsvendor-type equation. Published by Elsevier B.V.;[1]We consider a vertically integrated papermaking operation composed of an integrated pulp and paper mill with its regional supply network. Considering land procurement and harvest rotation as strategic decision variables, we construct a model to establish a long-range timber supply plan that minimizes the total discounted cost of meeting annual virgin wood fiber demand over an infinite horizon. Under appropriate assumptions on costs and storage, the land procurement and harvest rotation decisions are separable with harvest rotation being determined via a forest economics-type equation (...) [1]OR in agriculture;[2]forest economics;[3]normal forest;[4]regulated forest;[5]newsvendor model;[6]forestry supply chain management;[1]OR in agriculture;[2]forest economics;[3]normal forest;[4]regulated forest;[5]newsvendor model;[6]forestry supply chain management;[1]OR in agriculture;[2]forest economics;[3]normal forest;[4]regulated forest;[5]newsvendor model;[6]forestry supply chain management;[1]OR in agriculture;[2]forest economics;[3]normal forest;[4]regulated forest;[5]newsvendor model;[6]forestry supply chain management;[1]OR in agriculture;[2]forest economics;[3]normal forest;[4]regulated forest;[5]newsvendor model;[6]forestry supply chain management;[1]OR in agriculture;[2]forest economics;[3]normal forest;[4]regulated forest;[5]newsvendor model","WOS:000257343900020.2;WOS:000085158500015;WOS:000085158500015;WOS:000085158500015;WOS:000085158500015;WOS:000085158500015;WOS:000085158500015;WOS:000257343900020.11;WOS:000257343900020.11;WOS:000257343900020.11;WOS:000257343900020.11;WOS:000257343900020.11;WOS:000257343900020.11;WOS:A1986AXW5200020;WOS:A1986AXW5200020;WOS:A1986AXW5200020;WOS:A1986AXW5200020;WOS:A1986AXW5200020;WOS:A1986AXW5200020;WOS:000257343900020.26;WOS:000257343900020.26;WOS:000257343900020.26;WOS:000257343900020.26;WOS:000257343900020.26;WOS:000257343900020.26;WOS:A1996TU85300007;WOS:A1996TU85300007;WOS:A1996TU85300007;WOS:A1996TU85300007;WOS:A1996TU85300007;WOS:A1996TU85300007;WOS:A1990CU07500015;WOS:A1990CU07500015;WOS:A1990CU07500015;WOS:A1990CU07500015;WOS:A1990CU07500015;WOS:A1990CU07500015 (...)
Besides, I have some questions with the reference data. Here is an example (WOS:000268350400009):
"Havasi, Catherine|Lieberman, Henry|Pustejovsky, James|Speer, Robert","Brandeis Univ, Lab Linguist & Computat, Waltham, MA 02254 USA|MIT, Software Agents Grp, Cambridge, MA 02139 USA|Brandeis Univ, Lab Linguist & Computat, Waltham, MA 02254 USA|MIT, Media Labs Commonsense Comp Initiat, Cambridge, MA 02139 USA","USA|USA|USA|USA","IEEE INTELLIGENT SYSTEMS","WOS:000268350400009",NULL,"Journal","Digital Intuition: Applying Common Sense Using Dimensionality Reduction","2009",NULL,NULL,"WOS:000268350400009.12;WOS:000268350400009.7;WOS:A1995TC17500013;WOS:000268350400009.10;WOS:000268350400009.5;WOS:000268350400009.4;WOS:000268350400009.6;WOS:000182919000077;WOS:000268350400009.2;WOS:000268350400009.3;WOS:000268350400009.1;WOS:000268350400009.13;WOS:000224961900027","true"
The WoSid of the paper is "WOS:000268350400009" and the WoSid of one of the reference is "WOS:000268350400009.12". From the format of the id, It seems that it just represents that it is in the reference list of the paper. Does it match the WoSid in the extra dataset?
For the extra dataset, the duplicated attributes are "keywords" and "refs". Here is an example: (WOS:000211422000006)
Ghodrati\, Behzad|Kumar\, Uday,Lulea Univ Technol\, Div Operat & Maintenance Engn\, Lulea\, Sweden|Lulea Univ Technol\, Div Operat & Maintenance Engn\, Lulea\, Sweden,Sweden|Sweden,JOURNAL OF QUALITY IN MAINTENANCE ENGINEERING,WOS:000211422000006,10.1108/13552510510601366,Journal,APPLICATIONS AND CASE STUDIES Reliability and operating environment-based spare parts estimation approach,2005,[5]Originality/value-Previously\, the state of the specific technology and other factors have demonstrated the need for support in enhancing system effectiveness and preventing unexpected downtime. This paper sets the required number of spare parts necessary to fulfil this need.;[2]Design/methodology/approach-A model is provided in this paper to determine the number of required spare parts with respect to the effect of the external factors\, except time\, on the reliability characteristics of components through the proportional hazard model. The model is verified with estimation of the number of spare hydraulic jacks\, used on a load-haul-dump (LHD) machine\, as non-repairable components. The reliability of this non-repairable part and its operational impact are assessed\, while considering environmental factors and ignoring them.;[1]Purpose - With continuous technological development in the twenty-first century\, the industry and industrial systems have become complex and making their availability more critical. In this context\, the product support and its related issues such as spare parts play an important role. Lack of timely or incomplete support\, such as the lack of spare parts when required\, is likely to cause unexpected downtimes\, which in turn often lead to incompensatable losses. Therefore the importance of predicting the correct support to keep the system functionally available needs to be emphasized. Required number of spare parts could be obtained based on technical and life parameters. This paper seeks to examine the system reliability and operating environment\, which are the two parameters to be considered in this article.;[3]Findings - The results indicate that the operating environment of system/ machine has considerable influence on system performance. Forecasting the required support/ spare parts based on technical characteristics and the system-operating environment is an optimal way to prevent unplanned stoppages.;[4]Practical implications- The environmental conditions in which the equipment is to be operated\, such as temperature\, humidity\, dust\, road conditions\, maintenance facilities\, maintenance crew training\, operators' skill\, etc.\, often have considerable influence directly on the system/ machine or component reliability and indirectly on the product supportability characteristics. Spare parts\, are classified as a product support item whose availability is important when planned or unplanned maintenance is to be carried out. Forecasting the required number of spare parts\, based on technical characteristics and operating environmental conditions of a system\, is one of the best ways to optimize unplanned stoppages.,[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[5]Sweden;[4]Systems and control theory;[3]Distribution and inventory management;[2]Operations management;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden;[1]Spare parts;[2]Operations management;[3]Distribution and inventory management;[4]Systems and control theory;[5]Sweden,000211422000006.13;000211422000006.23;000211422000006.23;000211422000006.23;000211422000006.23;000211422000006.23;000211422000006.17;000211422000006.17;000211422000006.17;000211422000006.17;000211422000006.17;WOS:000174318200008;WOS:000174318200008;WOS:000174318200008;WOS:000174318200008;WOS:000174318200008;000211422000006.19;000211422000006.19;000211422000006.19;000211422000006.19;000211422000006.19;WOS:A1995QT55800001;WOS:A1995QT55800001;WOS:A1995QT55800001;WOS:A1995QT55800001;WOS:A1995QT55800001;WOS:A1987G015400049;WOS:A1987G015400049;WOS:A1987G015400049;WOS:A1987G015400049;WOS:A1987G015400049;000211422000006.4;000211422000006.4;000211422000006.4;000211422000006.4;000211422000006.4;000211422000006.16;000211422000006.16;000211422000006.16;000211422000006.16;000211422000006.16;000211422000006.21;000211422000006.21;000211422000006.21;000211422000006.21;000211422000006.21;000211422000006.26;000211422000006.26;000211422000006.26;000211422000006.26;000211422000006.26;000211422000006.24;000211422000006.24;000211422000006.24;000211422000006.24;000211422000006.24;WOS:000172033400018;WOS:000172033400018;WOS:000172033400018;WOS:000172033400018;WOS:000172033400018;000211422000006.11;000211422000006.11;000211422000006.11;000211422000006.11;000211422000006.11;WOS:000211422000006.22;WOS:000211422000006.22;WOS:000211422000006.22;WOS:000211422000006.22;WOS:000211422000006.22;000211422000006.31;000211422000006.31;000211422000006.31;000211422000006.31;000211422000006.31;000211422000006.12;000211422000006.12;000211422000006.12;000211422000006.12;000211422000006.12;WOS:A1985APN4600004;WOS:A1985APN4600004;WOS:A1985APN4600004;WOS:A1985APN4600004;WOS:A1985APN4600004;000211422000006.13;000211422000006.13;000211422000006.13;000211422000006.13;000211422000006.28;000211422000006.28;000211422000006.28;000211422000006.28;000211422000006.28;WOS:A1970F194200009;WOS:A1970F194200009;WOS:A1970F194200009;WOS:A1970F194200009;WOS:A1970F194200009;WOS:A1989AW96100004;WOS:A1989AW96100004;WOS:A1989AW96100004;WOS:A1989AW96100004;WOS:A1989AW96100004;000211422000006.30;000211422000006.30;000211422000006.30;000211422000006.30;000211422000006.30;000211422000006.20;000211422000006.20;000211422000006.20;000211422000006.20;000211422000006.20;WOS:A1972N572600003;WOS:A1972N572600003;WOS:A1972N572600003;WOS:A1972N572600003;WOS:A1972N572600003;WOS:000088258200004;WOS:000088258200004;WOS:000088258200004;WOS:000088258200004;WOS:000088258200004;WOS:A1985AST4800005;WOS:A1985AST4800005;WOS:A1985AST4800005;WOS:A1985AST4800005;WOS:A1985AST4800005;000211422000006.1;000211422000006.1;000211422000006.1;000211422000006.1;000211422000006.1;000211422000006.5;000211422000006.5;000211422000006.5;000211422000006.5;000211422000006.5;WOS:A1994NT83300010;WOS:A1994NT83300010;WOS:A1994NT83300010;WOS:A1994NT83300010;WOS:A1994NT83300010;000211422000006.27;000211422000006.27;000211422000006.27;000211422000006.27;000211422000006.27;WOS:000168933000014;WOS:000168933000014;WOS:000168933000014;WOS:000168933000014;WOS:000168933000014,true,cited
Hope this helps find the problem.
Thanks!
Yangyang
Hi Yangyang,
Sorry for the slow response. Thanks to your information, we were able to identify the problem and the updated data set can be downloaded at (link valid for a week) https://iunimag.blob.core.windows.net/mag-2019-01-25/YangyangPapersUpdated.csv.gz?st=2019-04-27T23%3A34%3A33Z&se=2019-05-04T23%3A34%3A00Z&sp=rl&sv=2017-07-29&sr=b&sig=d%2BIXoIQL5AHQUQyc8FeaoGX6ruKaOkAnmoCQWbwLzLU%3D
The new data fixed the duplicates in abstract and keywords, while corrected the order authors and their affiliations appear in the nested structure. Previously, the order was shuffled and does not reflect their appearance in papers.
As for references records with decimal numbers like "WOS:000268350400009.12", they mean such references are not found in the whole WoS collection. You will not be able to find a match even in the extend data set.
Hi Xiaoran,
Thank you very much for the dataset. We'll download and take a look at it in these days.
Best,
Yangyang
发件人: Yan, Xiaoran 发送时间: 2019年2月17日 22:18 收件人: Chang, Yangyang 抄送: Ding, Ying; Hutchinson, Matthew Alexander; Ma, He; Pentchev, Valentin; Mabry, Patricia L 主题: RE: [Update]_Journal list of management science
Hi Yangyang,
Here is my first take of your requested dataset. The data is from WoS and current only has the papers in the listed journals. You can download with the following link (will be valid for a week, and I suggest you use a download manager to open the link):
https://iunimag.blob.core.windows.net/mag-2019-01-25/data-1550457554868.csv.gz?st=2019-02-18T02%3A49%3A14Z&se=2019-02-26T02%3A49%3A00Z&sp=rl&sv=2017-07-29&sr=b&sig=i1LAoVDNAKXJhtATuK%2BWUansGeKKJW879ne%2BMlczYz4%3D
Once unpackaged, the csv file will be 300GB with 191993 papers. I added one more column “addressverified” to flag those papers with “enhanced affiliations” from WoS, which means their authors and addresses are clearly mapped in order.
Please download and get back to me if you find any problems. From my experience, it takes a few updates to finalize as your research progress. I will be producing another dataset for the external citations (the refs in the current file already contains external documents but their information is not listed).
Thank you!
Xiaoran
From: Chang, Yangyang Sent: Wednesday, February 13, 2019 8:58 PM To: Yan, Xiaoran yan30@iu.edu Subject: [Update]_Journal list of management science
Hi Xiaoran,
We update our journal list and add another three journals. If you didn't start collecting our data I guess you can use the new list:
Thanks! :)
Yangyang
发件人: Chang, Yangyang 发送时间: 2019年2月5日 21:14 收件人: Yan, Xiaoran 抄送: Ding, Ying 主题: 答复: Journal list of management science
Hi Xiaoran,
Thanks for doing that. Here is some specific information of our data requirement:
Best,
Yangyang
发件人: Yan, Xiaoran 发送时间: 2019年2月4日 14:56 收件人: Chang, Yangyang 抄送: Ding, Ying 主题: Re: Journal list of management science
Hi Yangyang,
Thank you for the list. We are in the process of moving to a new server, but I shall be able to get the dataset in 2 weeks.
Can you also specify other features besides abstracts and citations? Authors, institutions, etc. For citations, I imagine you would need not just internal citations between these journals. For external citations, what information do you need? The same as the papers in the list?
Thank you!
Xiaoran
On 1/31/19 4:45 PM, Chang, Yangyang wrote: