rubygems / new-index

Prototype and documentation of the new gem index format
12 stars 2 forks source link

etag being cache friendly is good, but could be better? #4

Closed fotanus closed 9 years ago

fotanus commented 9 years ago

Hi there, Great project, thanks for bringing this to the rubyists!

While we can be cache friendly with http etag, much more information than is necessary is replied. For instance, the first time you request nokogiri, you would an answer like this:

1.1.5 |checksum:abc123
1.1.6 rake:>= 0.7.1,activesupport:= 1.3.1|ruby:> 1.8.7,checksum:bcd234
1.1.7.rc2 rake:>= 0.7.1|ruby:>= 1.8.7,rubygems:> 1.3.1,checksum:cde345
1.1.7.rc3 |rubygems:> 1.3.1,checksum:def456
1.2.0-java mini_portile:~> 0.5.0|checksum:fgh567

and now a new gem is added, with the current file on server being like this:

1.1.5 |checksum:abc123
1.1.6 rake:>= 0.7.1,activesupport:= 1.3.1|ruby:> 1.8.7,checksum:bcd234
1.1.7.rc2 rake:>= 0.7.1|ruby:>= 1.8.7,rubygems:> 1.3.1,checksum:cde345
1.1.7.rc3 |rubygems:> 1.3.1,checksum:def456
1.2.0-java mini_portile:~> 0.5.0|checksum:fgh567
1.2.1 rake :>=0.9|checksum:deadbeef

The server will reply you all the gems again, but the only information you was needing was the last line. If we could add a mechanism to avoid sending this new information, and only send what is new, we would transfer less bytes. The most naive idea is to add a timestamp at the start of each line:

1425900039 1.1.5 |checksum:abc123
1425900041 1.1.6 rake:>= 0.7.1,activesupport:= 1.3.1|ruby:> 1.8.7,checksum:bcd234
1425900042 1.1.7.rc2 rake:>= 0.7.1|ruby:>= 1.8.7,rubygems:> 1.3.1,checksum:cde345
1425900043 1.1.7.rc3 |rubygems:> 1.3.1,checksum:def456
1425900044 1.2.0-java mini_portile:~> 0.5.0|checksum:fgh567
1425900045 1.2.1 rake :>=0.9|checksum:deadbeef

And do a request with your lastests time stamp. Than the server can .split(params[:timestamp]).last. Should not be much process, but of course serving static files is way faster. So I'm not sure if this tradeoff is worth?

indirect commented 9 years ago

We're planning on using HTTP range requests to accomplish this without needing any additional metadata.

On Mon, Mar 9, 2015 at 4:23 AM, Felipe Tanus notifications@github.com wrote:

Hi there, Great project, thanks for bringing this to the rubyists! While we can be cache friendly with http etag, much more information than is necessary is replied. For instance, the first time you request nokogiri, you would an answer like this:

1.1.5 |checksum:abc123
1.1.6 rake:>= 0.7.1,activesupport:= 1.3.1|ruby:> 1.8.7,checksum:bcd234
1.1.7.rc2 rake:>= 0.7.1|ruby:>= 1.8.7,rubygems:> 1.3.1,checksum:cde345
1.1.7.rc3 |rubygems:> 1.3.1,checksum:def456
1.2.0-java mini_portile:~> 0.5.0|checksum:fgh567

and now a new gem is added, with the current file on server being like this:

1.1.5 |checksum:abc123
1.1.6 rake:>= 0.7.1,activesupport:= 1.3.1|ruby:> 1.8.7,checksum:bcd234
1.1.7.rc2 rake:>= 0.7.1|ruby:>= 1.8.7,rubygems:> 1.3.1,checksum:cde345
1.1.7.rc3 |rubygems:> 1.3.1,checksum:def456
1.2.0-java mini_portile:~> 0.5.0|checksum:fgh567
1.2.1 rake :>=0.9|checksum:deadbeef

The server will reply you all the gems again, but the only information you was needing was the last line. If we could add a mechanism to avoid sending this new information, and only send what is new, we would transfer less bytes. The most naive idea is to add a timestamp at the start of each line:

1425900039 1.1.5 |checksum:abc123
1425900041 1.1.6 rake:>= 0.7.1,activesupport:= 1.3.1|ruby:> 1.8.7,checksum:bcd234
1425900042 1.1.7.rc2 rake:>= 0.7.1|ruby:>= 1.8.7,rubygems:> 1.3.1,checksum:cde345
1425900043 1.1.7.rc3 |rubygems:> 1.3.1,checksum:def456
1425900044 1.2.0-java mini_portile:~> 0.5.0|checksum:fgh567
1425900045 1.2.1 rake :>=0.9|checksum:deadbeef

And do a request with your lastests time stamp. Than the server can .split(params[:timestamp]).last. Should not be much process, but of course serving static files is way faster. So I'm not sure if this tradeoff is worth?

Reply to this email directly or view it on GitHub: https://github.com/bundler/new-index/issues/4

fotanus commented 9 years ago

Cool, didn't knew this range requests existed.