issues
search
r-three
/
common-pile
Repo to hold code and track issues for the collection of permissively licensed data
MIT License
22
stars
6
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add code to process the Python PEPs dataset
#94
blester125
closed
1 week ago
2
Fix formatting on Foodista
#93
blester125
closed
1 week ago
0
CCCC - license filter
#92
baberabb
opened
1 month ago
0
Add domain counting script
#91
wildphoton
opened
1 month ago
0
'Update DPI'
#90
conceptofmind
opened
1 month ago
6
Data Provenance Initiative Errors
#89
nkandpa2
opened
2 months ago
8
Openreview
#88
craffel
opened
2 months ago
1
Macrothink institute
#87
craffel
opened
2 months ago
0
Tool for comparing dolma data after preprocessing.
#86
blester125
closed
2 months ago
0
Add Dolma Counter script
#85
blester125
closed
4 months ago
3
Regulations
#84
nkandpa2
opened
4 months ago
1
Add correct text ordering and column merging to CL
#83
conceptofmind
opened
4 months ago
29
Wiki Data
#82
blester125
opened
4 months ago
23
Update get_data.sh
#81
conceptofmind
closed
4 months ago
4
Update to Ubuntu Chat processing
#80
blester125
closed
5 months ago
0
updates needed when processing the full stack exchange dumps
#79
blester125
closed
5 months ago
0
Changes made while scraping the whole Project Gutenberg data.
#78
blester125
closed
5 months ago
0
Updates needed to fully scrape foodista data.
#77
blester125
closed
5 months ago
0
Hansard
#76
baberabb
opened
5 months ago
0
Replace inconsistent internal license identifiers with persistent SPDX identifiers using scancode-toolkit
#75
storytracer
opened
5 months ago
1
Library of Congress Public Domain Books
#74
storytracer
opened
5 months ago
6
Library of Congress public domain books (loc_books)
#73
storytracer
opened
5 months ago
0
Stackv2
#72
Muennighoff
closed
5 months ago
2
USPTO
#71
baberabb
closed
4 months ago
4
USGPO
#70
nkandpa2
opened
5 months ago
2
Scape examples from foodista
#69
blester125
closed
5 months ago
0
Lintangsutawika news
#68
blester125
closed
5 months ago
10
Pdr
#67
nkandpa2
closed
5 months ago
1
[WIP] USPTO
#66
baberabb
closed
5 months ago
0
Linux Kernel Mailing List
#65
nkandpa2
closed
5 months ago
1
US Government Publishing Office
#64
nkandpa2
opened
6 months ago
7
add ci for linting
#63
blester125
closed
6 months ago
0
Create a shared scraping function.
#62
blester125
closed
6 months ago
0
Data Provenance data
#61
shayne-longpre
closed
5 months ago
14
publicdomainreview.org
#60
nkandpa2
closed
5 months ago
3
Add scripts to download and process CourtListener Opinion data
#59
wildphoton
closed
5 months ago
1
Feat/arxiv
#58
blester125
closed
1 month ago
0
Create a shared logging setup.
#57
blester125
closed
7 months ago
0
PubMed Central
#56
alon-albalak
closed
6 months ago
6
Earnings Call Transcripts
#55
nkandpa2
closed
5 months ago
2
Internet Archive Podcasts
#54
nkandpa2
opened
9 months ago
1
YouTube Transcripts
#53
nkandpa2
opened
9 months ago
6
Update StackExchange Preprocessing.
#52
blester125
closed
9 months ago
1
Feat/wiki scraper
#51
blester125
closed
1 month ago
3
Update README.md
#50
blester125
closed
9 months ago
0
Efficient Reshard Tool
#49
blester125
opened
9 months ago
2
New Public Domain Books
#48
StellaAthena
opened
9 months ago
0
Hansard(s)
#47
baberabb
opened
9 months ago
5
Famous Datasets Tracker
#46
StellaAthena
opened
9 months ago
0
News
#45
lintangsutawika
closed
5 months ago
9
Next