metabrainz / web-service-v3-design

A collaborative specification for the third version of the MusicBrainz web service
2 stars 1 forks source link

Whats the point of requiring 5 artist calls to get artist information, when no duplicates #18

Open ijabz opened 12 years ago

ijabz commented 12 years ago

Unlike ws/2 the artist endpoint doesn't does allow you include things such as releases the artist is linked to, and this basic artist information is not returned when you do other endpoints such as release. The information you can retrieve of aliases, annotations and tags can only be retrieved for this endpoint so why not just return this information in one call ?

This would provide a better service for the client, and would not add much load to MusicBrainz. In fact it may add less load to Musicbrainz as I expect most of the time clients doing an artist lookup would also want at least aliases and annotations most of the time, so if done in one go would require Musicbrainz dealing with http requests.

Same argument for labels, works, and similar for other entities.

ocharles commented 12 years ago

The information you can retrieve of aliases, annotations and tags can only be retrieved for this endpoint so why not just return this information in one call?

Because that increases the minimum response size, and add bloats that some clients won't need. In order to make things fast, we have to make them as slim as possible. The more data you want, the longer you should be expected to wait. But if you're asking small questions, then we can give you fast responses.

This would provide a better service for the client, and would not add much load to MusicBrainz. In fact it may add less load to Musicbrainz as I expect most of the time clients doing an artist lookup would also want at least aliases and annotations most of the time, so if done in one go would require Musicbrainz dealing with http requests.

In the worst case, where every single artist request is also matched with a request for aliases, annotations, and tags then yes - the load will increase, but note that it will only increase by the baseline of a HTTP request - which is very small. The client experience suffers a bit more because of increased round-trip, but I don't think this is conclusive enough to change the design on.

However, I don't think this is the case. As clients ask more and more targeted questions, we're more efficient because we are only fetching and rendering data that a client has specifically requested.

In short, I don't think we should be designing things because we are afraid of vague feelings that 'it's slower' without really justifying it. Both load, response time, and round-trip time depend strongly on the actual implementation. A 2x slowdown might sound terrifying, but when you measure it as a change from 1ms to 2ms, is it worth worrying about? Are the other benefits that the design provides worth that tradeoff?

ijabz commented 12 years ago

1ms to 2ms is indeed not a problem, but the problem is this specification still includes rate limiting and therefore as the rate limiter has not been defined I have to assume it will be at a similar to now, so the different is between 1ms and 1001 ms.

And we are not applying the same logic for a release (thanks goodness), i.e a release always contains tracks, we dont say that a user might want release details without track details they are always returned, so how are you making that call ?

ocharles commented 12 years ago

as not been defined I have to assume it will be at a similar to now

Why assume that though? You know that the rate limiter is a problem, and it was probably the number 1 problem in the web service design documents. It's clear that what we have now certainly isn't going to work. The rate limiter should be designed around an API, not an API designed around a rate limiter.

a release always contains tracks, we dont say that a user might want release details without track details they are always returned, so how are you making that call?

I did debate having a /tracklist endpoint on releases actually, but figured there would be so much backlash against that, that I just merged it into a single end point. I do think being able to get a list of tracks is a very common thing for MB so I baked that in fairly explicitly. There was no hard rule that made me decide to do that though.

ijabz commented 12 years ago

The point about the rate limiter is that there is still a rate limiter and even if it allows faster throughput it stills means the time taken to perfor 2 queries instead of one isnt just query1 _ query2 its query1 + query2 + ratelimitdelay so your example of changing from 1ms to 2ms is unrelealistic.

Regarding tracklists then the endpoint isnt 100% rigid then, so we could includes aliases when you just return an alias (after all these aliases only exist as part of an artist) and it wouldn't break the design. Its just a difference of opinion on how often aliases are used and whther they are used enough to be included by default, well Picard and Jaikoz both use them for starters , how you could work out quite how much they are dused I do not know.

ocharles commented 12 years ago

The point about the rate limiter is that there is still a rate limiter and even if it allows faster throughput it stills means the time taken to perfor 2 queries instead of one isnt just query1 _ query2 its query1 + query2 + ratelimitdelay so your example of changing from 1ms to 2ms is unrelealistic.

No, you're still making an assumption about how the rate limiter behaves. There is no real decision yet that we will have a constant delay based rate limiter. If it's changed to "you can only make 100 requests a minute", then there's no reason why you need any delay at all (in fact, you might even be able to dispatch all those requests out in parallel).

Regarding tracklists then the endpoint isnt 100% rigid then, so we could includes aliases when you just return an alias (after all these aliases only exist as part of an artist) and it wouldn't break the design. Its just a difference of opinion on how often aliases are used and whther they are used enough to be included by default.

Yep, this is exactly correct. If it's clear that aliases are very often required, then we could think about embedding them in the response of /artist/:mbid.

ijabz commented 12 years ago

The thing I really dont get about the rate limiter, is that still need to provide support for ws/2 (and possibly ws/1) so the resources that these services take up would make it very difficult to provide a much faster ws/3 rate wouldnt they ?

But my main point with this issue is the 'bloat' added by returning aliases is minimal, and it can never be faster to have multiple calls than a single call unless you do issue parallel requests, which can get compllicated. This is the not the kind of bloat that is causing any issue with ws.2 the problem is duplicate info, and when we link one top level entity to another (artist with all releases) and relationships.

ocharles commented 12 years ago

The thing I really dont get about the rate limiter, is that still need to provide support for ws/2 (and possibly ws/1) so the resources that these services take up would make it very difficult to provide a much faster ws/3 rate wouldnt they ?

Not really, there's no reason why /ws/3 can't be served off a completely different machine (for example, lolo).

It might or might not be noticeable bloat, but my point is more that we can't remove this data once we add it. We can, however, add new data later. So I'm more inclined to start with a very slim web service, and carefully studying the requests that we receive in order to optimize it for the common cases.