jsha / blocktogether

Share your blocks and subscribe to others'
GNU General Public License v3.0
330 stars 68 forks source link

CSV downloads are a partial list of IDs #238

Closed georgedorn closed 6 years ago

georgedorn commented 7 years ago

I went here: https://blocktogether.org/show-blocks/TJl_eDUojyl7wgjZOaRDTBqQTgamC4_Z3PJo1ONW

I clicked 'Download CSV'.

I got a file that looks like this:

794445235896127489 4334019343 848349435671302145 501805957 [snip]

I expected there to be >7000 items, but there's only the 500 from the pagination. Also, user IDs alone aren't that useful; I'd prefer to have the same contents as the table (or at least current usernames), but with every user in the blocklist.

spencerthayer commented 7 years ago

+1

jsha commented 7 years ago

Thanks, I've reproduced this and will fix the pagination issue. Probably I won't include screen names, for two reasons:

rbanffy commented 7 years ago

I was looking at the code but couldn't find the CSV generation at first glance. I'm building something that would benefit enormously of the lists as a training data mass, so, if you point me the right direction, I can prepare a PR.

Since I'll be working either on Ubuntu or a Mac with MacPorts, I can also update the readme (but that should be a separate PR, of course).

dr2chase commented 6 years ago

Also +1 here. My block list got too large before I realized that it was unshareable, and I can't figure out how to deal with it except to throw the whole thing out and start over. If I could download, I could arbitrarily pick the oldest 50,000 to unblock, and restart from there.

rbanffy commented 6 years ago

There is a workaround: If you get CSV from URLs like https://blocktogether.org/show-blocks/TJl_eDUojyl7wgjZOaRDTBqQTgamC4_Z3PJo1ONW.csv?page=n where n is the page you are looking at, it'll retrieve a unique set of IDs. This way, the whole can be exported.

for i in $(seq 15)
do 
    curl https://blocktogether.org/show-blocks/TJl_eDUojyl7wgjZOaRDTBqQTgamC4_Z3PJo1ONW.csv?page=$i >> naziscore.csv
done

does the trick for this list, at least. It has 15 pages and the resulting file has 7271 lines (now)

dr2chase commented 6 years ago

Thank you, I can confirm that it does download a list of numbers and that they are all different, so this is probably working.

jsha commented 6 years ago

Thanks! This is now fixed. For block lists up to 250,000 you can download the full list.