Open h01ger opened 5 years ago
What are you trying to achieve here? What is the goal? I ask because this specific solution would be quite an engineering effort as the files are stored in the S3 object store (so accessing each one is not as easy as reading off a disk...) and this file would surely be /enormous/?
On Wed, Feb 20, 2019 at 12:19:43AM -0800, Chris Lamb wrote:
What are you trying to achieve here? What is the goal?
checking whether all binary packages from ftp.d.o could have been reproduced?
I ask because this specific solution would be quite an engineering effort as the files are stored in the S3 object store (so accessing each one is not as easy as reading off a disk...) and this file would surely be /enormous/?
the file should be a gigabyte or two?
-- tschau, Holger
holger@(debian|reproducible-builds|layer-acht).org
PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
Perhaps I'm misunderstanding something but there are 11,887,190 .buildinfo
files on buildinfo.debian.net right now. I suspect this will be larger than a gigabyte, even compressed. :)
On Wed, Feb 20, 2019 at 03:29:14AM -0800, Chris Lamb wrote:
Perhaps I'm misunderstanding something but there are 11,887,190
.buildinfo
files on buildinfo.debian.net right now. I suspect this will be larger than a gigabyte, even compressed. :)
:) and hmpf.
see my mail on rb-general@ just now, for my motivation on this.
I think we need to find some way to get the meaningful .buildinfo files out of these 11 million files, which are the 600000 ones relating to .deb files served from ftp.d.o. and then those 600000 can again be divided into chunks of 60000 per arch (actually less, due to arch:all).
that's coming from my r-b.o/Debian POV where I want to be able to assert the real world reproducibilit of Debian.
Another POV is my user POV: I have 1800 binary packages installed on this machine and I would like to know which are (not) reproducible...
One thing which could maybe help would be a way to do a query for 1000 (read: many) .buildinfo files at once...
-- tschau, Holger
holger@(debian|reproducible-builds|layer-acht).org
PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
This very much smells like an Xy problem and additionally I think getting this right will require some actual thought into how folks (not just yourself) will want to access these files, rather than "just" adding an endpoint and being done with it.
My gut feeling is that simply adding a 'n buildinfo at once' is not sustainable as 'n' will never be large enough. What, really, are you doing with the data you return? Just checking whether something matches? If so, one doesn't need the entire buildinfo. BUT that's just one possible may example of an access pattern, we should try and work out how to do this at real scale. :)
Hi Chris,
On Fri, Feb 22, 2019 at 01:35:19AM -0800, Chris Lamb wrote:
This very much smells like an Xy problem and additionally I think getting this right will require some actual thought into how folks (not just yourself) will want to access these files, rather than "just" adding an endpoint and being done with it.
My gut feeling is that simply adding a 'n buildinfo at once' is not sustainable as 'n' will never be large enough. What, really, are you doing with the data you return? Just checking whether something matches? If so, one doesn't need the entire buildinfo. BUT that's just one possible may example of an access pattern, we should try and work out how to do this at real scale. :)
can we discuss this on irc at some scheduled time? maybe even today? ;)
doing this via email/comments is very awkward due to the very high latency.
also we could include discussing #54 as well.
-- tschau, Holger
holger@(debian|reproducible-builds|layer-acht).org
PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
Good morning. Unfortunately I am just about to go offline until at least tomorrow afternoon CET, possibly Sunday. But regardless of that I think we are on different pages about the scope of this particular issue. I believe it requires quite a considerable discussion to understand the potential access patterns to whatever data is stored within buildinfo files and furthermore IMHO these sorts of discussions come to much much better results if they can be done async — in other words, the "latency" you decry (although I have been replying quickly, no?) is actually a feature. Rushing this bit is a bit silly anyway as implementing (any) solution will take some thought and attention too, possibly rebuilding indices on the server, etc. etc. which may take days (no, really!) to run.
(re. #54, this is far more straightforward; whilst this one actually has lots of nuance, including how reproducible-check
/ apt
can sanely use the same pattern.)
On Fri, Feb 22, 2019 at 03:58:50AM -0800, Chris Lamb wrote:
Good morning. Unfortunately I am just about to go offline until at least tomorrow afternoon CET, possibly Sunday.
no problem & good morning as well! :)
But regardless of that I think we are on different pages about the scope of this particular issue. I believe it requires quite a considerable discussion to understand the potential access patterns to whatever data is stored within buildinfo files
agreed
and furthermore IMHO these sorts of discussions come to much much better results if they can be done async — in other words, the "latency" you decry (although I have been replying quickly, no?) is actually a feature. Rushing this bit is a bit silly anyway as implementing (any) solution will take some thought and attention too, possibly rebuilding indices on the server, etc. etc. which may take days (no, really!) to run.
I agree this shouldnt be rushed but I very much disagree on async being better here. We need to design something together, so IMO even better than IRC would be RL chatting.
so can we maybe discuss this on Monday or Tuesday? I suppose 30min will be plenty and wouldnt be surprised if we'd only need half the time.
-- tschau, Holger
holger@(debian|reproducible-builds|layer-acht).org
PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
so can we maybe discuss this on Monday or Tuesday? I suppose 30min will be plenty and wouldnt be surprised if we'd only need half the time.
30 minutes? I think we have very different ideas about the scope of the problem. :) Even IRL would not be great for me - this thing really requires some "walking" or "shower" time. :p
On Fri, Feb 22, 2019 at 04:32:44AM -0800, Chris Lamb wrote:
so can we maybe discuss this on Monday or Tuesday? I suppose 30min will be plenty and wouldnt be surprised if we'd only need half the time.
30 minutes? I think we have very different ideas about the scope of the problem. :) Even IRL would not be great for me - this thing really requires some "walking" or "shower" time. :p
15min to get on the same boat.
without being on the same boat, we cannot have this discussion.
this is just a waste of time.
-- tschau, Holger
holger@(debian|reproducible-builds|layer-acht).org
PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
On Fri, Feb 22, 2019 at 04:52:23AM -0800, Holger Levsen wrote:
15min to get on the same boat.
I mean:
a.) come to a common understanding of what is needed (what we want to achieve, not how) b.) come to a common understanding of the limititations (of the current situation, both in terms of 'reality' (data model on the server etc) as well on the clients (data model on consuming sides, eg for users wanting to verify themselves, as well as services trying to get a global overview)
Once we have that, I very much agree that taking a walk or a shower might be appropriate to think about this thoroughly. :)
Sorry if I came accross harshly before.
-- tschau, Holger
holger@(debian|reproducible-builds|layer-acht).org
PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
I've got two usecases, there are probably more and others.
1.) a user wanting to rebuild all the binaries they are using. For that they need to know all the buildinfo files of all installed packages. (a user can be a real user like you and me or also the tails projects who want to verify all the binaries on their images)
2.) a service which wants to verify all binary packages which Debian offers as part of their 'Buster' release or all packages currently in unstable/main/arm64 (as an example).
https://jenkins.debian.net/job/reproducible_compare_Debian_sha1sums/ now downloads all the .buildinfo files for all packages in sid (with the versions in sid), each.
It would be really cool if there was a way to download them all as one big .tgz or whatever.