osulp / dspace2hydra

7 stars 1 forks source link

degree name remapped_split_field data issues #191

Closed revgum closed 6 years ago

revgum commented 7 years ago

see #181 Following is a uniq list of the values that fail our remapped_split_field for looking up the proper degree name and field values.

As of the time of writing this ticket, the regex for splitting these values is: (.*\))\s*(in|n)\s*(.*)

The logic is:

  1. Find the first match (ie. Bachelors of Science (B.S.)) and use the lookup file to map this value to the clean consistent name
  2. If there is a third match, this is the degree_field (ie. Agriculture)
  3. The second match is the " in " or " n ".. there are others that need to be accounted for such as "iin", "-n"or even "in the"?!
Bachelor of Science (B.S.) in Electrical Engineering
Bachelor's
Bachelor's of Science (B.S.) in Agriculture
Doctor of Philosophy (Ph. D.) Fisheries and Wildlife
Doctor of Philosophy (Ph. D.) i Forest Management
Doctor of Philosophy (Ph. D.) i Horticulture
Doctor of Philosophy (Ph. D.) iin Counseling
Doctor of Philosophy (Ph. D.) in Botany and Plant Pathology
Doctor of Philosophy (Ph. D.) IN Chemical Engineering
Doctor of Philosophy (Ph. D.) IN Chemistry
Doctor of Philosophy (Ph. D.) IN Fisheries and Wildlife
Doctor of Philosophy (Ph. D.) n Forest Management
Doctor of Philosophy (Ph. D.) -n Microbiology
Doctor of Philosophy (Ph. D.) Oceanography
Engineering
Honors Bachelor of Arts
Honors Bachelor of Arts (H.B.A.)
Honors Bachelor of Fine Arts (HBFA)
Honors Bachelor of Science (H.B.S.)
Honors Bachelor of Science (HBS)
Master of Arts in Interdisciplinary Studies
Master of Arts in Interdisciplinary Studies (MAIS)
Master of Arts in Interdisciplinary Studies (MAIS) in Philosophy, Philosophy and Anthropology
Master of Education (E.D.M.) in Education
Master of Environmental Sciences in the Professional Master's Program
Master of Interdisciplinary 
Master of Natural Resources (M.N.R.)
Master of Ocean Engineering (M.Oc.E. ) IN Mechanical Engineering
Master of Public Policy (M.P.P.)
Master of Public Policy (MPP)
Master of Science (M.S.)
Master of Science (M.S.) Bioresource Engineering
Master of Science (M.S.) i Agricultural and Resource Economics
Master of Science (M.S.) iin Chemical Engineering
Master of Science (M.S.) iin Electrical and Computer Engineering
Master of Science (M.S.) iin Forest Engineering
Master of Science (M.S.) iin Horticulture
Master of Science (M.S.) iin Industrial Engineering
Master of Science (M.S.) iin Wood Science
Master of Science (M.S.) IN Apparel, Interiors, and Merchandising
Master of Science (M.S.) IN Chemical Engineering
Master of Science (M.S.) IN Chemistry
Master of Science (M.S.) IN Crop Science
Master of Science (M.S.) IN Fisheries and Wildlife
Master of Science (M.S.) IN Food Science and Technology
Master of Science (M.S.) in Forest Engineering
Master of Science (M.S.) IN Industrial and General Engineering
Master of Science (M.S.) IN Industrial Engineering
Master of Science (M.S.) IN Nuclear Engineering
Master of Science (M.S.) IN Nutrition and Food Management
Master of Science (M.S.) in Rangeland Resources
Master of Science (M.S.) min Foods and Nutrition
Master of Science (M.S.) n Animal Science
Master of Science (M.S.) n Clothing, Textile, and Related Arts
Masters of Science (M.S.) in Forest Science
Philosophy (Ph. D.) in Forest Science
Undergraduate Thesis
zhang4952 commented 7 years ago

The lookup/degree_name.yml could handle some of the new or mis-spelled degree names, and we can correct the value of few really 'odd' items in DSpace

revgum commented 7 years ago

@zhang4952 I've fixed the degree_names.yml locally but have open questions for the following.. What degree_name should they map to?

It might be easiest for you to generate a SQL query for the ITEMs that have these values for their degree.name so we can understand what to do @zhang4952?

What do we do with these

zhang4952 commented 7 years ago

The spreadsheet for the problematic degree names: https://docs.google.com/a/oregonstate.edu/spreadsheets/d/1NfjTAwlJV40AFhp7KwzCeJUv6MbQq9MBYPhzry-byaE/edit?usp=sharing. Check the rows in pink @revgum @vantuyls

vantuyls commented 7 years ago

what collection(s) are the Undergraduate Thesis documents in?

zhang4952 commented 7 years ago

The crosswalk for these degree names is finalized.

zhang4952 commented 6 years ago

mapping/degree.rb should be able to handle these cases: Bachelor's Engineering Master of Environmental Sciences in the Professional Master's Program Master of Natural Resources (M.N.R.) Master of Public Policy (M.P.P.) Master of Public Policy (MPP) Undergraduate Thesis