softwaresaved / rse-repo-analysis

Study of research software in repositories. Contact: @karacolada
BSD 3-Clause "New" or "Revised" License
12 stars 0 forks source link

Curated dataset of RSE repos #41

Closed karacolada closed 11 months ago

karacolada commented 1 year ago

We have true and false positives for repositories created for the publication and those used in it / referenced as related work.

karacolada commented 12 months ago

Dataset prepared at data/outputs/eprints_w_intent.csv. Schema:

Data columns (total 11 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   github_repo_id              130 non-null    object        
 1   mention_created             130 non-null    bool          
 2   pub_title                   130 non-null    object        
 3   pub_author_for_reference    130 non-null    object        
 4   pdf_url                     130 non-null    object        
 5   page_no                     130 non-null    int64         
 6   detected_github_url         130 non-null    object        
 7   pattern_matched_github_url  130 non-null    object        
 8   eprints_date                130 non-null    datetime64[ns]
 9   eprints_pub_year            130 non-null    int64         
 10  eprints_repo                130 non-null    object        
dtypes: bool(1), datetime64[ns](1), int64(2), object(7)
memory usage: 15.4+ KB
karacolada commented 11 months ago

Regarding first checkbox: yes!

mention_type_timeline

Grey filled areas are one, two, three years respectively.