Update to use Reddit's official data

lloydowen8 / place-heatmap-2022

Heatmap visualisation for the r/Place 2022 event

27 stars 2 forks source link

Update to use Reddit's official data #13

Open KemptonM opened 2 years ago

KemptonM commented 2 years ago

The current data set excludes some snapshots at the very beginning and just before the whiteout. Reddit's data is complete from beginning to end

memmam commented 2 years ago

Reddit's data is incredibly mangled (final snapshot is wrong, CSV data is out of order). I suggest using this torrent instead, which is a minified and reordered version of the CSV data, from Scaevolus on the Place Atlas discord:

[magnet link removed due to issues with data, see Edit 2]

It's still complete, in the correct order, and ~1/5th the size

Edit: BOTH the official dataset and this minified version have incorrect admin rect data, see here

Edit 2: disregard the torrent, we're still working out kinks. Regardless, Reddit's dataset seems incorrect. I will keep you updated.

KemptonM commented 2 years ago

I might misunderstand - is the data in this torrent still based on Reddit's official CSV data, or does it come from scrapers?

lloydowen8 commented 2 years ago

I agree that the data used should probably use the joint torrent data. There are a few scappers out there that could be implemented to collect this data automatically rather than having the user download the dataset.

See here for an example

However, I'd suggest that this should probably be a forked project. I don't have the time to manage this repo as I have my finals in a couple weeks. I'd also be happy to add collaborators to the project to manage pull requests as long as it was handled appropriately

memmam commented 2 years ago

@KemptonM it comes from Reddit's official CSV data, but it's going to take us some time to repair it as Reddit's official CSV data is incredibly mangled. Right now we're working on combining the r/Place timeline data (which has snapshots spaced every 1sec apart) with the CSV data to try and determine the actual order of pixels placed and repair the data.

memmam commented 2 years ago

@lloydowen8 I'd be fine taking over the project if you'd like, I just started a new quarter at UCSD so I'm fairly busy myself, but I've been making time to work on a lot of r/Place related projects in the last few days, and I think I've been the most active person in this repo anyways. I can fork it or you can transfer ownership to me, your choice.

lloydowen8 commented 2 years ago

This has been addressed in the standalone script. However, the notebook should also be adapted to use this data.