natestedman / Observatory

A Python based dashboard for the Rensselaer Center for Open Source Software. For continued development, please see http://github.com/rcos/observatory
rcos.rpi.edu
ISC License
8 stars 14 forks source link

Avoid duplicate posts by coming up with a good duplicate scheme, or just making sure not to send duplicates to template #62

Open colinsullivan opened 13 years ago

natestedman commented 13 years ago

md5 would work fine, the only issue is marking them for both projects. Choices seem to be:

  1. Don't. Just update the last updated time.
  2. Make event -> project one to many instead of one to one. Unpleasant migration. Commit is also a subclass of Event and it makes no sense there. However, they need to both be subclasses so that we can pull them both in the feed easily.
colinsullivan commented 13 years ago

People are having duplicates because their personal blog RSS feed includes posts from their projects RSS feed (i.e. they are different versions of an RSS feed from the same site).

natestedman commented 13 years ago

Ok, let's add an md5 field to Event or BlogPost and force that to be unique, then add additional associations when they come in.

natestedman commented 13 years ago

For display, I think we should give preference to project associations and ignore personal blogs if there is one, we match the author anyways.

colinsullivan commented 13 years ago

For now, I was thinking about just filtering out the duplicates on display, and worrying about handling the duplicates on fetch at a later time.