mosteo / alire-old-discussion

Design of an Ada language library repository
5 stars 1 forks source link

server-side design #3

Open OneWingedShark opened 8 years ago

OneWingedShark commented 8 years ago

There are several questions of philosophy that will impact the design -- reuse of extant infrastructure or designing a custom infrastructure being the most dramatic. If the former, we can have a "quicker" start at the cost of forcing extant systems into a new mold, if the later we will get the feeling of "reinventing the wheel" with the benefit of having the whole thing "work together as if that's what it's meant for" -- this is, in esence, the same argument one could have about using a C-library binding/import vs. creating the same functionality natively in Ada.

Or, if you will, usage of the type system -- wee can, after all, use it to eensure (eg) that no constraint-violating values are inserted into [or retreved from] a database. (E.G. a phone-number type which is a string w/ a particular format [or set of values, rather].

Applying this, we could ensure that only well-formed 'projects' are resident within the repository. We could also integrate w/ unit-tests to waarn (or error) on failing tests. -- While this would certainly increase the barrier to submitting a 'project', it also would have the effect of ensuring that ALL projects within the repository are buildable and [therefore] of some quality.

OneWingedShark commented 8 years ago

PS - The "up-front" work this would require of submitters would pay off in enduser-experience, as this repository would have a reputation for having quality code. (Increasing the reputation of Ada, transitivly.)

ohenley commented 8 years ago

What do you think about Jenkins? Can it answer our needs? https://en.wikipedia.org/wiki/Jenkins_(software) https://builds.apache.org/

Looks like there's already a pluggin for GNAT: https://wiki.jenkins-ci.org/display/JENKINS/Gnat+Plugin https://github.com/jenkinsci/gnat-plugin

OneWingedShark commented 8 years ago

Jenkens is a solution to a problem which can be totally avoided. While the following paper is concerned with VCS-style management, the solution eradicates the need for continuous integration.

https://www.google.com/url?q=http://users.ece.utexas.edu/~perry/work/papers/icsm87.pdf&sa=U&ved=0ahUKEwjag-Le5fjKAhUH4WMKHZ6jCQ0QFggNMAA&sig2=r8HSaczr_fWGnpiy7KspXw&usg=AFQjCNHDZhmKJNqApbdeq3a0oS92k3Y8gg

This is completely consequential to the heirarchical-nature of the databases described.

ohenley commented 8 years ago

Nice! I'll read that for sure tomorrow. Thx.

ohenley commented 8 years ago

Ok, that was an eye opening read! That would make one hell of a solid solution. So practically how it would be achieved in Ada:

  1. The paper concept of modules maps to the Ada packages?
  2. For us to not reinvent the wheel, would it be possible to leverage the .ali files. Do they contain all the infos we need? Are they GNAT specific? If they are, is it a problem?

a.) It gives the "with-ing" dependencies and a hash code for each unit. b) We could resolve our "experimental database" tests compilation based on that. c) Anytime a hash code as changed (kept in database), we compile against dependencies. From the result, we map a broken or successful dependency in our database relation. We report to owners on both side of the interfacing code.

Note: From personal experience, it is extremely frustrating to have dependency libs installed in an automated fashion to realize, at compile time, that their interfaces do not match. More often than not, you run into side problems when trying to fix it. A regression to an older, or newer, api brings other dependencies to break etc. When this happens, you are left with a bigger problem than what you tried to circumvent in the first place by using an automated pipeline.

OneWingedShark commented 8 years ago

1) I think 'module' from the paper corresponds to a program subsystem (eg audio) and maps to a set of Ada packages. 2) I don't know if that is possible, they are indeed GNAT speciffic and IIRC can change formats between versions.

The experimntal DB described in the paper admittedly works better on the developer's side (as the integrasted VCS of the IDE), with the aways-consistant root being what is uploaded to the repository. However, this does not mean that a similar system cannot be used for the PM -- the main difference being that an entry to the 'package' in the PM repo would have all versions of the 'package' for which the project portion would indicate dependencies on other packages. (The dependence version could be automatically determined by "walking backward" along versions until the used interfaces 'line up' for compilation purposes, this would solve the version mismatch problem you describe, but require the use of a DB aminable IR instead of text.)

ohenley commented 8 years ago

Yes for sure. If coder B stops maintaning his software which stops working binded against version2.0+ of lib A, his code still stays valid, and distributed, bounded to version1.0 of lib A.

mosteo commented 8 years ago

Couldn't the same effect be achieved with properly referenced commits? Excuse me if I talk nonsense, I just took a quick look at the paper.

OneWingedShark commented 8 years ago

@moseto -- In theory, but that route would be akin to Positive vs int, where "it is the programmer's responsibility to ensure the values are [always] valid" instead of having a design that allows you to tie the assumed constraints into the type itself and let the tool take care of the isssue.

mosteo commented 8 years ago

@OneWingedShark Not sure I follow. If consistency is enforced on submission and updates where is the user responsibility? But I must read the paper.

ohenley commented 8 years ago

I may be wrong but I think properly in "properly referenced commits" is the key. So many programmers wont do it "properly". A machine could flag the caveat.

mosteo commented 8 years ago

@ohenley I've been reading the dub and hackage websites and now I have a better understanding of how they do it; I will post a summary later on, but it's essentially the same idea we are discussing.

mosteo commented 8 years ago

I have read the paper referenced above and here are some quick thoughts:

In our case, we wouldn't have a single repository but many trees of them, which is equivalent to that hierarchical database, so I'd say we are in safe ground (and in the same page?). But I'm very biased here, because dvcs 1) works and 2) is widely used and 3) I don't see anything fundamentally ground-breaking in that paper compared to current best practices. I may be missing something.

OneWingedShark commented 8 years ago

@mosteo

The hierarchy described is implicit in the methods we are discussing, in the sense that each module metadata points to other modules

Is it though? Metadata is fine and all but it doesn't give you the reliability that the data-actual does. For instance, what happens if user A deletes their repo but it was referenced in the metadata, or if it was hosted on github which then goes down? -- Depending on extant services means we are not isolated from their failures… in addition to the temptation toward limiting ourselves to the model they're using.

I'd argue that today dvcs are solving the same problem as long as you don't allow broken pushes to go in

But VCS is only part of the problem: the whole problem is [versioned] dependency-management w/ distributed storage. Also, HOW do you propose ensuring no broken submissions? Accept the revision and then reject updating to it until Jenkens + test-suite passes?

But I'm very biased here, because dvcs 1) works and 2) is widely used and 3) I don't see anything fundamentally ground-breaking in that paper compared to current best practices.

Is anybody sayng DVCS doesn't work? I'm not; I am saying that what it solvees is not the problem we are dealing with… much like people thinking HTML or CSV can be properly procssed via RegEx, there's a missunderstanding as to the problem-space.

The ideas aren't groundbreaking in that they are somewhat applied (ali files, though they're only dealing w/ compilation unit metadata (IIRC).

I'd rather get the up-front design issues out of the way before we find out we're trying to parse HTML /w RegEx.

mosteo commented 8 years ago

Is anybody sayng DVCS doesn't work? I'm not; I am saying that what it solvees is not the problem we are dealing with… much like people thinking HTML or CSV can be properly procssed via RegEx, there's a missunderstanding as to the problem-space.

@OneWingedShark I apologize if I mischaracterized your point. I think we should work towards a short-term/long-term roadmap taking into account development effort and funding required. In the end we are all programmers able to write tools for incremental evolution.

OneWingedShark commented 8 years ago

@mosteo - I don't think you were mischaracterizing it, certainly not intentionally. There's an old saying: "Well begun is half done." -- I just want to have that good beginning, and that means undersstanding the problems at more than just a superficial level.

mosteo commented 8 years ago

@OneWingedShark OK! At this point I think I have a good understanding of everything that has been discussed, so time permitting I'll move proposals to the wiki and try to summarize pros/cons (if nobody beats me to it :P)

OneWingedShark commented 8 years ago

@mosteo -- Cool. You might want to put a link on these threads to allow seamless transition to the appropriate wiki page/section when you get those put up. (I assume you'll close these 'issues'.)