issues
search
trivio
/
common_crawl_index
Index URLs in Common Crawl
193
stars
48
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update README.md
#30
pim
closed
4 years ago
0
project deprecated?
#29
jric
opened
7 years ago
1
fix README.me: pip install boto
#28
dportabella
opened
8 years ago
0
Update README.md
#27
princeedward
opened
9 years ago
0
How can I get a file in text mode?
#26
alibezz
opened
9 years ago
1
Retrieving urls for a specific coutnry tld?
#25
giorgio79
opened
10 years ago
0
Recommended Instance in Documentation
#24
wpdevs
opened
10 years ago
0
Update index?
#23
maccman
opened
10 years ago
28
Any plans to index and support the newer datasets?
#22
gurgeh
closed
10 years ago
1
AttributeError: 'NoneType' object has no attribute 'get_key'
#21
ldgarcia
opened
10 years ago
1
Fixed index_lookup_local value format
#20
jeffnappi
closed
6 years ago
0
reverse hostname transformation breaks urls with username:password@domain.com
#19
keiw
opened
11 years ago
1
Remove unnecessary auth in index_lookup_remote
#18
oyiptong
opened
11 years ago
1
ImportError: No module named boto
#17
danielnicollet
closed
11 years ago
9
Cleanup, re-structuring and installation
#16
wiseman
opened
11 years ago
2
URLs not correctly sorted in index?
#15
wiseman
closed
11 years ago
3
Correct 'libs' path to 'lib', and correct the commented-out parameter order in object initialisation
#14
oskarpearson
closed
11 years ago
1
Unexpected results with different key lengths
#13
wiseman
opened
11 years ago
3
revers hostname transformation is ambiguous
#12
keiw
opened
11 years ago
2
remote_copy script to copy pages looked-up with the index to your s3 bucket
#11
jspacker
closed
11 years ago
1
Update README.md
#10
jhosteny
closed
11 years ago
1
arcFileParition should be arcFilePartition in code example at end of README.md
#9
fichtitious
opened
11 years ago
0
remote_read does not work using Python 2.6
#8
soult
closed
11 years ago
2
typo: arcFileParition should be arcFilePartition
#7
jronallo
closed
11 years ago
1
Investigate anomalies
#6
srobertson
opened
11 years ago
0
Docs should mention that urls are stored in revers hostname order.
#5
srobertson
opened
11 years ago
2
Unauthorized access to aws public dataset bucket
#4
keiw
closed
11 years ago
1
./remote_read www.direkt-einkauf.at does not return anything
#3
keiw
closed
11 years ago
3
Documentation for data blocks is missing the compressedSize field
#2
soult
closed
11 years ago
1
Docs and code for creating the common crawl index
#1
srobertson
closed
11 years ago
0