Closed karacolada closed 11 months ago
Dataset prepared at data/outputs/eprints_w_intent.csv
. Schema:
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 github_repo_id 130 non-null object
1 mention_created 130 non-null bool
2 pub_title 130 non-null object
3 pub_author_for_reference 130 non-null object
4 pdf_url 130 non-null object
5 page_no 130 non-null int64
6 detected_github_url 130 non-null object
7 pattern_matched_github_url 130 non-null object
8 eprints_date 130 non-null datetime64[ns]
9 eprints_pub_year 130 non-null int64
10 eprints_repo 130 non-null object
dtypes: bool(1), datetime64[ns](1), int64(2), object(7)
memory usage: 15.4+ KB
Regarding first checkbox: yes!
Grey filled areas are one, two, three years respectively.
We have true and false positives for repositories created for the publication and those used in it / referenced as related work.