Closed bzz closed 7 years ago
Same happens with 2000 repos from text file:
testing=# select count(*) from repositories;
count
-------
1814
@bzz is not a Borges error, we have duplicated repos in python and java lists, like this: github.com/mihaic/graphalytics.git
@bzz Also you can check it in the top200repos.txt
file too: sort top200repos.txt | uniq --count
.
I will close the issue, feel free to reopen if I'm wrong.
Thanks for catching this! I believe the confusion is from sort -u
above, wich already filters out all dupes.
✅ sort top200repos.txt | uniq | wc -l
178
✅ sort top200repos.txt | uniq | wc -l
1814
borges producer --source=file --file ./top200repos.txt
borges.buriedQueue
emptytop200repos.txt 200 lines files,
cat top200repos.txt | sort -u | uniq -d -c
is empty so there is no duplicated andwc -l
is200