Open ipsrt opened 5 years ago
Oh nice, someone is interested in this!
I read through your PR and thought about this for a bit. The main issue I'm seeing is that, if we now generate a delta for every announce (instead of only for announces with actual deltas in uploaded/downloaded), we would potentially generate a massive amount of deltas. To me, this kinda defeats the purpose of recording deltas in the first place - we could just report out all announces straight to some event processing system and throw queries at that.
Also, if we emit a delta for each announce anyway, the consumer can calculate the seed time on their end, based on the reported timestamps, or something like that...
We could, for example, instead add a dirty
bit to the userSwarmStats
, and also a seedTimeSinceLastDelta
or something like that. Then, when we receive a Stopped
event or clean up the userSwarmStats
due to GC, we would check the dirty
bit and, only once, emit a delta with DeltaUp==0
and DeltaDown==0
. We should probably still rely on updating the seedTimeSinceLastDelta
on each announce, and shouldn't report (current time - last received announce) or something like that, in case peers leave silently, or die.
Alternatively, we could emit a second type of event, namely something that keeps track of seed time.
What do you think?
PS: Also, I checked back into IRC today, I'm always idling there and read your messages, I'll see if I catch you at some point, otherwise this works fine too. About your question on IRC: Nope, I don't know of anyone who is using this. I just thought it up as a sane way to potentially handle the whole private tracker situation. It's not super nice (probably eats a lot of memory), but again, never tried anywhere.... who knows.
My first thought was also to just take the StatDeltas as they are and work out the seeding time backwards from those. The issue I see with this is that in some trackers it's common for seeds to have 0 seeding activity for weeks on end. I.e., the tracker would emit 0 stat deltas over that period. So if the tracker would crash or something like that, all the seeding time progress would be lost for that peer, up until the most recent stat delta that was processed for them I get your concerns about emitting a huge amount of deltas. To me it makes sense to not have seeding time functionality in a seperate middleware since all the sharding and infohash sorting plumbing is set up (and memory allocated) already for the StatDelta middleware. Emitting a second event just for seedtime would make sense as well to me. Or maybe a config option that sets whether or not seeding time is tracked at all? I'm sure you'd know better which approach would be more sane from an organisation & performance standpoint
Alright, here's what I thought up:
StatDelta
to include a field for continuous seeding time in seconds since the last reported delta.userSwarmStats
to include a field for that too.update
function to:
userSwarmStats.lastUpdate
.StatDelta
is emitted, reset this to zero.userSwarmStats
is to be removed (because of a stopped event), check if the seeding time is != 0. If yes: emit a StatDelta
.removeExpired
function to emit StatDelta
s for userSwarmStats
with seeding time != 0. Change all the places calling this to handle the deltas emitted.userStats
, similar to the GC function. What this does is: for each userSwarmStats
in there, if the seed time is != 0, emit a delta with zero upload/download, empty event, and the cumulative seed time set to what was set in the userStats
. Then resets the seed time to zero.userStats
.Let me know if you think I forgot something.
Why I think this should work well:
Grand total: Some additional events, of which the rate and granularity can be configured; some additional storage; some additional complexity.
Optional: We might make it so that people can disable this functionality. We cannot save the space in the struct, but we can skip on the added complexity and we can not do the periodic flushing.
Many private trackers track not only upload and download deltas in each announce, but also the incremental seeding time. This is used to enforce site rules (eg. torrents must be seeded for a minimum of 7 days) or as part of a site currency (eg. every hour each user gets points based on how many torrents they're seeding and for how long, then the points can be used to improve the users seeding ratio or to buy features on the site)
Since
userSwarmStats
structs already track the time of the users last announce, to me it makes sense to integrate this delta seeding time into this middleware.Let me know your thoughts on how & where to include a seeding time feature into the middleware.