lewiscawte / CIA.vc

Automatically exported from code.google.com/p/cia-vc
1 stars 0 forks source link

stats list should be sortable by "most active last month" #25

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Here:
http://cia.vc/stats/project
"This page has a very large number of child items, and CIA can not yet
display them all or browse them incrementally. Below is an arbitrary set of
100 items. Sorry for the inconvenience, we're working on resolving this issue."
Also applies to: http://cia.vc/stats/author

Instead, if this page could generate this for the top-100 most active of
last month, it would make this list very very useful and interesting.

Transcript below:
2009-11-02 on #cia 
[12:03] marclaporte: can I trouble you with some more questions?
[12:03] BearPerson: sure, that's what I'm here for :)
[12:04] marclaporte: hehe
[12:04] marclaporte: I was wondering about: http://cia.vc/stats/project
[12:04] marclaporte: I would like a list of wiki engines, sortable by
activity level
[12:04] BearPerson: hmm
[12:05] BearPerson: I guess that would mean implementing that list properly
for one thing, and adding a keyword setting with projects, for another
[12:05] marclaporte: if possible, someway I could get that data by rss,
json, yaml, etc so I could re-use in a wiki page
[12:05] BearPerson: how else would CIA know what a wiki engine was, I suppose
[12:05] marclaporte: freshmeat, ohloh, sourceforge, etc have category trees
or tags
[12:06] BearPerson: so I'm supposed to go sift through sourceforge?
[12:06] BearPerson: we spend too much resources just keeping the darn thing
running, at this point 8)
[12:06] marclaporte: hehe, I understand
[12:06] marclaporte: I know what the engines are
[12:06] marclaporte: so I could just get the data for 15-20 engines
[12:07] marclaporte: which I do manually now
[12:07] BearPerson: *nod*
[12:07] BearPerson: some kind of keyword system might be useful, yeah
[12:07] marclaporte: "This page has a very large number of child items, and
CIA can not yet display them all or browse them incrementally. Below is an
arbitrary set of 100 items. Sorry for the inconvenience, we're working on
resolving this issue."
[12:07] marclaporte: http://cia.vc/stats/project  -> any chance that this
could work?
[12:07] marclaporte: maybe instead of using  events today, events
yesterday, total events
[12:07] BearPerson: it'd kind of be a higher priority than the
keyword-search thing, I guess - keywords alone would be fairly useless
without that
[12:08] marclaporte: how about events last month
[12:08] BearPerson: it'll work when someone (me, I guess) goes along and
implements more fancy SQL queries and cursors, I guess
[12:08] marclaporte: events last month can be counted less often, so less
pressure on server
[12:09] BearPerson: "get me all project entries sorted by activity, entry
301-400"
[12:09] BearPerson: not terribly complicated, I'm sure, just has to be
looked at and done
[12:10] marclaporte: Can you explain "has a very large number of child
items, and CIA can not yet display them all or browse them incrementally." ?
[12:10] BearPerson: well, we track about 25000 projects right now, if I
don't have the numbers wrong
[12:10] BearPerson: you don't really want all of them in your browser
window at the same time ;)
[12:11] marclaporte: hehe
[12:11] BearPerson: so at the moment we just go to the database and say
"give me all projects you have, but not more than 100", and it gives them
in whatever order it feels like at the moment
[12:11] BearPerson: it would be useful to have "page 72 of 250" style
browsing, but for that we have to work on the code for that page a bit
[12:12] BearPerson: given that we don't have a terrible lot of developers,
and they don't spend their time exclusively on CIA.vc, that may take a while 8)
[12:13] marclaporte: ok, how about just asking for number of messages last
month?
[12:13] marclaporte: so 100 most active projects last month
[12:13] marclaporte: no pagination
[12:13] BearPerson: we may have to introduce that, so it'll start counting
from zero, but it may make sense, I guess
[12:13] BearPerson: less fluctuation than "most active today"
[12:13] marclaporte: yes, daily stuff is not very good
[12:13] BearPerson: though it'll probably have to be implemented a bit
better to not look silly
[12:14] BearPerson: just resetting everything to 0 at midnight doesn't
really work
[12:14] marclaporte: number of messages since the beginning is problematic
as well
[12:14] BearPerson: as a sorting criteria, yes
[12:14] marclaporte: number of messages last month is already kept somewhere
[12:15] BearPerson: I guess we could shoot for some kind of decaying
average, would save us having to track N values per project just to be able
to do a proper rolling average
[12:16] BearPerson: hmm, we have "this month" and "last month"
[12:16] BearPerson: which still gives us the "reset to zero" problem,
really, unless we get some kind of smart
[12:16] marclaporte: this month is not optimal either...
[12:16] marclaporte: as you say
[12:16] BearPerson: thismonth + (lastmonth/3) or something
[12:16] BearPerson: you couldn't really call it "activity past month" then,
but it'd be the rough idea
[12:17] marclaporte: if someone does a big commit at the beginning of the
month, it would be weird for a few days
[12:17] BearPerson: oh, it only counts commits, not the size
[12:17] marclaporte: ah, ok
[12:17] BearPerson: of course, all those new-age git / distributed-scm
people that send in a notification for every commit in a merge can mess
things up anyway
[12:19] BearPerson: you haven't seen senseless load when you haven't seen
the system try and log a couple of thousand commits that just happened to
be pushed from one branch into another ;)
[12:19] marclaporte: hehe
[12:22] marclaporte: when CIA reports things like:   16270 messages since
the first one, 3.38 years ago, for an average of 1.82 hours between messages
[12:22] marclaporte: where is this data from?
[12:23] marclaporte: does CIA analyse complete project logs?
[12:23] BearPerson: let me check
[12:23] BearPerson: I think it's a counter incremented with every commit,
either in sql or in the binary datastore
[12:24] BearPerson: counting messages is easy, storing creation time is
too, the average is just hitting the divide button on the 'ole calculator
[12:24] marclaporte: ok, so having last month, the month before and the
month before that ?
[12:25] BearPerson: I think we only store this and last month
[12:25] marclaporte: please see:
https://sourceforge.net/project/stats/detail.php?group_id=64258&ugn=tikiwiki&typ
e=svn&mode=12months
[12:25] BearPerson: I'm sure it looks more fancy than cia ;)
[12:26] marclaporte: It's quite good but just for SF-hosted projects
[12:27] BearPerson: to be precise, we have counters for "forever",
"{last,this}{Month,Week}", "today", "yesterday"
[12:28] BearPerson: though I suppose it wouldn't take up terribly much
space to have some kind of longer-reaching stats
[12:28] BearPerson: though generating graphs will be beyond our capability
until we move to a different server
[12:28] marclaporte: ok, if this page http://cia.vc/stats/project could
generate this out for top-100 largest of last month, it would make this
list very very useful and interesting
[12:29] marclaporte: We could see which project are very active
[12:29] BearPerson: or seem very active :)
[12:29] marclaporte: and top-100 will still permit to see beyond the usual
very big projects
[12:30] BearPerson: the old saying goes "don't trust any statistics you
haven't manipulated yourself" ;)
[12:30] marclaporte: hahaha
[12:30] BearPerson: we take whatever we get without much verification, it's
quite easy for a malfunctioning hook to skew the numbers significantly
[12:30] BearPerson: (also, last time I checked there was an odd race
condition that two commits very close together could get counted as one)
[12:31] BearPerson: but yeah, it's a useful idea
[12:31] BearPerson: either you or me should remember to dig up the issue
tracker on google code and enter those as [seperate, for better
manageability] feature requests so they don't get forgotten
[12:31] marclaporte: two commits very close maybe should be one :-)
[12:32] BearPerson: still using CVS? ;)
[12:32] marclaporte: I can add to the tracker
[12:32] marclaporte: we use CVS for old legacy branch, stable, dev and
experimental are in SVN
[12:33] BearPerson: though really, coping with CVS's broken non-atomic
commits should be done in the hooks that send out notifications, not where
they get received
[12:34] BearPerson: from what I hear, there's working scripts that try to
collect together commits in seperate directories for emails, the same
should be possible (isn't it being done already?) for cia, I suppose
[12:35] marclaporte: Also applies to: http://cia.vc/stats/author  (although
not as important)
[12:35] BearPerson: yupyup, same problem, same mechanism, same solution
[12:37] marclaporte: may I copy/paste this conversion log to the tracker?
[12:38] BearPerson: if you want, though I'm sure you can come up with more
to-the-point descriptions of problem, idea and suggested solution ;)
[12:42] marclaporte: :-)

Original issue reported on code.google.com by marclaporte on 2 Nov 2008 at 5:55

GoogleCodeExporter commented 9 years ago
Executive summary:
Once we have implemented a stats listing page better than
"this is 100 random projects",
it would be useful if one could sort by activity in the time frames we track
(we may need better numbers than "commits this month" to avoid nonsense at the 
start 
of the month)

Original comment by Unbearab...@gmail.com on 2 Nov 2008 at 6:02