rr- / malgraph4

MALgraph: statistics service for MyAnimeList.net users.
http://mal.oko.im
Other
158 stars 39 forks source link

Incredibly slow DB operations #124

Closed rr- closed 10 years ago

rr- commented 10 years ago

http://stackoverflow.com/questions/784173/what-are-the-performance-characteristics-of-sqlite-with-very-large-database-file

Conclusion: the usermedia table should be distributed to many tables. How many? This should be configurable. I think 200 will be okay (50k users x 200 media each = 10 000 000 rows in usermedia = 50 000 rows in random usermediaX on average).

Basically, when we need to refer to usermedia asking questions like "Hey, what's the anime list of rr-?" the query that looks like SELECT * FROM usermedia WHERE ... should be changed for instance to SELECT * FROM usermedia4 WHERE ...

Proposed mechanisms on how to know which table to refer to:

The last approach can be changed in such way that instead of distributing user media over multiple tables, we can distribute users over multiple database files. Then we can attach only database we need. This is probably best approach.

Just remember when implementing this that stuff like cron-globals.php needs to deal with usermedia in global manner without user-specific queries. Thus all global usermedia processing needs to be changed as well.

Note that queries like 'search for users that...' will be difficult. So far we use such query only in globals IIRC.