mozilla / chronicle

find everything you've ever found
http://mozillachronicle.tumblr.com/
Mozilla Public License 2.0
16 stars 6 forks source link

Store all the things from Embedly #330

Closed nchapman closed 9 years ago

nchapman commented 9 years ago

There's lots of potentially neat stuff we could use from the Embedly extraction that we currently throw away. Let's figure out how to keep it all so that we don't have to re-extract every page when we find a good use for that data.

This might also be a good time to consider a pages table so that we only store one copy of this data.

jaredhirsch commented 9 years ago

deferring the creation of a pages table because we don't have our URL normalization story quite yet (#228), we'll get there eventually :-P