mmetro / WeirdSideofYouTube

MIT License
6 stars 3 forks source link

Use ObjectID for _id field #16

Closed mmetro closed 8 years ago

mmetro commented 8 years ago

This is sort of related to #5, but a more specific issue

I'm think the best long-term solution is to let mongo use the default value of ObjectID for the _id field. The MongoDB documentation states:

Generally in MongoDB, you would not use an auto-increment pattern for the _id field, or any field, because it does not scale for databases with large numbers of documents. Typically the default value ObjectId is more ideal for the _id.

After getting rid of the numbered _id, we can find a random video using this method: http://stackoverflow.com/a/13524758

I can't think of a good reason why users would need to request a range of videos, so removing the ability to get a video or range of videos by ID shouldn't be an issue.

For the admin panel, we can get a range of videos by specifying a vidID to start from, instead of an _id. The client for the admin page will know the ID of the videos, so it can ask the API for more videos starting after the last ID.

To provide a uniform distribution, the rand values need to be spaced evenly over the range 0-1. This can be done every once in a while, or if we don't care, the rand value can be assigned at random. If we assign at random, some videos will be more likely to appear than others

Basically: stop numbering _id, add a "rand" field to all of the documents

mmetro commented 8 years ago

Now that I think about it, I'm not sure that this is the best implementation for our app.

If it's important to keep a uniform distribution of the rand field in the documents, we would have to periodically recreate all of the rand values, therefore I imagine that numbered _id would make adding videos faster, as we won't have to rebuild all of the rand values to make it uniform.

When deleting a video, I think that it's also faster to use numbered _id - In the numbered _id implementation, it's very important that we fill in the missing _id value. This is easy to do though, since we can just take the video with the greatest _id, and change its _id to the _id of the deleted video. - In the rand field implementation, to keep the distribution uniform, we need to rebuild the rand values of all documents when a video is deleted

I'm not sure exactly how much a "large numbers of documents" is, or why numbering _id does not scale, but it might be possible that for the scale of this app, numbering _id is actually more efficient. I'm leaning towards keeping things the way that we have them right now, unless it becomes obvious why using numbered _id will scale poorly

mmetro commented 8 years ago

I closed this because I think with our app, using numbered _id values will actually be faster. Looking up a video by its index will be much faster than its youtube ID, so I think all around, using numbered _id values will actually be faster than using ObjectIDs