Closed ampudia19 closed 2 months ago
Current results: | title_cr_null | Non-Null | Null | Total |
---|---|---|---|---|
Non-Null | 15199 | 44809 | 60008 | |
Null | 61840 | 197841 | 259681 | |
Total | 77039 | 242650 | 319689 |
Success rates, by publication type: | type | False | True |
---|---|---|---|
Book | 0.616126 | 0.383874 | |
Book Chapter | 0.697857 | 0.302143 | |
Book edited | 0.754649 | 0.245351 | |
Conference Proceeding | 1 | 0 | |
Conference/Paper/Proceeding/Abstract | 0.763496 | 0.236504 | |
Consultancy Report | 0.89703 | 0.10297 | |
Data Set | 0.571429 | 0.428571 | |
Journal | 1 | 0 | |
Journal Article/Review | 0.477465 | 0.522535 | |
Manual/Guide | 0.933628 | 0.0663717 | |
Monograph | 0.80241 | 0.19759 | |
Other | 0.588269 | 0.411731 | |
Policy briefing/Report | 0.896172 | 0.103828 | |
Preprint | 0.501253 | 0.498747 | |
Report | 0.828767 | 0.171233 | |
Scholarly edition | 0.935252 | 0.0647482 | |
Systematic review | 0.768519 | 0.231481 | |
Technical Report | 0.8958 | 0.1042 | |
Technical Standard | 0.884211 | 0.115789 | |
Thesis | 0.841305 | 0.158695 | |
Working Paper | 0.71762 | 0.28238 | |
journal-issue | 0.333333 | 0.666667 | |
patent | 1 | 0 | |
All | 0.618711 | 0.381289 |
Instances where both datasets match on something: | title_cr_eq_title_oa | count |
---|---|---|
Equal | 13158 | |
Not Equal | 2041 |
Examples of matching: | title_gtr_cr | title_cr | title_oa | doi_cr | doi_oa | type | author_gtr | author_cr | |
---|---|---|---|---|---|---|---|---|---|
82004 | Nudge Theory and Social Innovation: An Analysis of Citizen and Government Initiatives during Covid-19 outbreak in Malaysia. | Nudge Theory and Social Innovation: An analysis of citizen and government initiatives during Covid-19 outbreak in Malaysia | Nudge Theory and Social Innovation: An analysis of citizen and government initiatives during Covid-19 outbreak in Malaysia | 10.1109/r10-htc49770.2020.9357050 | 10.1109/r10-htc49770.2020.9357050 | Conference/Paper/Proceeding/Abstract | Minoi J | Minoi, Jacey-Lynn | |
218716 | Null tests of the concordance model in the era of Euclid and the SKA | Null tests of the concordance model in the era of Euclid and the SKA | Null tests of the concordance model in the era of Euclid and the SKA | 10.1016/j.dark.2021.100856 | 10.1016/j.dark.2021.100856 | Journal Article/Review | Bengaly Carlos A. P. | Bengaly, Carlos A.P. | |
107482 | Quenching star formation with quasar outflows launched by trapped IR radiation | Quenching star formation with quasar outflows launched by trapped IR radiation | Quenching star formation with quasar outflows launched by trapped IR radiation | 10.1093/mnras/sty1514 | 10.1093/mnras/sty1514 | Other | Costa T | Costa, Tiago | |
73003 | Influence of Twin Boundaries and Sample Dimensions on the Mechanical Behavior of Ag Nanowires | Influence of twin boundaries and sample dimensions on the mechanical behavior of Ag nanowires | Influence of twin boundaries and sample dimensions on the mechanical behavior of Ag nanowires | 10.1016/j.msea.2021.142150 | 10.1016/j.msea.2021.142150 | Journal Article/Review | Zhao H | Zhao, Hu | |
18849 | Accumulation of Deep Traps at Grain Boundaries in Halide Perovskites | Accumulation of Deep Traps at Grain Boundaries in Halide Perovskites | Accumulation of Deep Traps at Grain Boundaries in Halide Perovskites | 10.1021/acsenergylett.9b00840 | 10.26434/chemrxiv.8058413.v1 | Journal Article/Review | Park J | Park, Ji-Sang |
Examples of different matches: | title_gtr_cr | title_cr | title_oa | doi_cr | doi_oa | type | author_gtr | author_cr | |
---|---|---|---|---|---|---|---|---|---|
21901 | Integrating the Use of Official Statistics into Mainstream Curricula via Data Visualisation | Integrating the use of official statistics into mainstream curricula via data visualisation | INTEGRATING THE USE OF OFFICIAL STATISTICS INTO MAINSTREAM CURRICULA VIA DATA VISUALISATION | 10.52041/srap.13602 | nan | Conference/Paper/Proceeding/Abstract | Nicholson, J. | Nicholson, James | |
67252 | Investigation of the suitability of decellularized porcine pericardium in mitral valve reconstruction. | Investigation of the Suitability of Decellularised Porcine Pericardium for Mitral Valve Reconstruction | Investigation of the suitability of decellularized porcine pericardium in mitral valve reconstruction. | 10.5339/qproc.2012.heartvalve.4.39 | nan | Journal Article/Review | Morticelli L | Morticelli, Lucrezia | |
191593 | Trading at the Speed of Light: How Ultrafast Algorithms Are Transforming Financial Markets | Donald MACKENZIE, Trading at the Speed of Light. How Ultrafast Algorithms Are Transforming Financial Markets , Princeton, Princeton University Press, 2021, 304 p. | Trading at the Speed of Light: How Ultrafast Algorithms Are Transforming Financial Markets | 10.3917/res.234.0237 | nan | Book | MacKenzie Donald | Duterme, Tom | |
289901 | The Oxford Domed Lateral Implant: Increasing tibial component wall height reduces the risk of medial dislocation of the mobile bearing | The Oxford Domed Lateral Unicompartmental Knee Replacement implant: Increasing wall height reduces the risk of bearing dislocation | The Oxford domed lateral implant: Increasing tibial component wall height reduces the risk of medial dislocation of the mobile bearing | 10.1177/09544119211048558 | nan | Conference/Paper/Proceeding/Abstract | Yang I | Yang, Irene | |
28714 | LoCuSS: Exploring the selection of faint blue background galaxies for cluster weak-lensing | LoCuSS: exploring the selection of faint blue background galaxies for cluster weak-lensing | LoCuSS: Testing hydrostatic equilibrium in galaxy clusters | 10.1093/mnras/stw2192 | 10.1093/mnrasl/slv175 | Journal Article/Review | Ziparo Felicia | Ziparo, Felicia |
Methodological Readme for Collecting Data from OpenAlex using DOI and Reverse Lookups with Crossref and OpenAlex
Overview
This blob outlines the methodology for collecting data from OpenAlex using DOI, performing reverse lookups to match missing records using Crossref, and using OpenAlex's own matching capabilities through search works. The pipeline is implemented using Kedro, and the nodes and utilities provided are designed to handle the preprocessing, fetching, and matching tasks.
Preprocessing Steps
Preprocess DOIs
preprocess_publication_doi(df: pd.DataFrame) -> pd.DataFrame
10\..+
.Create DOI Input List
create_list_doi_inputs(df: pd.DataFrame, **kwargs) -> list
Fetching Data from OpenAlex
Fetch Papers
fetch_papers(ids: Union[List[str], List[List[str]]], mailto: str, perpage: int, filter_criteria: Union[str, List[str]], parallel_jobs: int = 8) -> Dict[str, List[Callable]]
Concatenate OpenAlex Data
concatenate_openalex(data: Dict[str, AbstractDataset]) -> pd.DataFrame
Reverse Lookup using Crossref
Match DOIs with Crossref
Key Functions and Process
Setting Up the Session
setup_session()
Cleaning HTML Entities
clean_html_entities(input_record: Dict[str, Union[str, int, float]]) -> Dict[str, Union[str, int, float]]
Formulating the Query
get_doi(outcome_id: str, title: str, author: str, journal: str, publication_date: str, mailto: str, session: requests.Session) -> Dict[str, str]
Fetching and Processing Results
_process_item(item: Dict[str, Union[str, Dict[str, str]]], title: str, author: str, journal: str, publication_date: str) -> Union[Dict[str, Union[str, int, float]], None]
Selecting the Best Match
_select_best_match(outcome_id: str, matches: List[Dict[str, Union[str, int, float]]]) -> Union[Dict[str, Union[str, int, float]], None]
Batch Processing
crossref_doi_match(oa_data: pd.DataFrame, gtr_data: pd.DataFrame, mailto: str) -> Generator[Dict[str, pd.DataFrame], None, None]
Reverse Lookup using OpenAlex
Search and Match in OpenAlex
Key Functions and Process
Cleaning HTML Entities for OpenAlex
clean_html_entities_for_oa(input_record: Dict[str, Union[str, int, float]]) -> Dict[str, Union[str, int, float, Dict[str, str]]]
Formulating the Query
get_oa_match(outcome_id: str, title: Union[str, List[str]], chapter_title: str, author: str, publication_date: str, config: Dict[str, str], session: requests.Session) -> List[Dict[str, str]]
Fuzzy Matching Authors
author_fuzzy_match(author: str, candidate_author: List[Dict[str, Union[str, Dict[str, str]]]])
Filtering Candidates by Author and Date
get_oa_match(outcome_id: str, title: Union[str, List[str]], chapter_title: str, author: str, publication_date: str, config: Dict[str, str], session: requests.Session) -> List[Dict[str, str]]
Key Utilities and Functions
OpenAlex Utilities (oa.py)
_revert_abstract_index
,_parse_results
,preprocess_ids
,_chunk_oa_ids
,_works_generator
,fetch_papers_for_id
, andjson_loader
.Crossref Utilities (cr.py)
_process_item
,clean_html_entities
,_select_best_match
,setup_session
, andget_doi
.OpenAlex Matching Utilities (oa_match.py)
_process_string
,clean_html_entities_for_oa
,get_oa_match
, andauthor_fuzzy_match
.