macports / macports-gsoc2018-webapp

MacPorts GSoC 2018: WebApp
6 stars 9 forks source link

Finalise the structure of database #2

Open Vishnum98 opened 6 years ago

Vishnum98 commented 6 years ago

Would need the help of the community to provide there suggestions for the database .

Link to database design document

mojca commented 6 years ago

Before going into further details, I spent a lot of time thinking about whether to:

One thing that I'm by now pretty clear about is that properly supporting history of port changes is relatively challenging and as a consequence totally not worth the effort. We should either do it properly or not do it at all, and here I would vote for not doing it at all. First of all, if we wanted this to be properly supported, we would need to start iterating from commit nr. 1 and then record all the changes since day 1. This would be super tricky because I'm sure that the latest version of portindex will not support the early syntax, so we would need to use older versions of MacPorts, which will definitely not work on the latest OS ... Way too complicated. On top of that: what would we record at all? Of course we would need to investigate every single commit, but some commits are merely whitespace changes, other commits just fix a typo in port description or fix a broken URL. Recording all the information would be an overkill, recording differential changes would complicate the database design to extremes for what is basically no added value for us.

The only thing from the history that is worth keeping is labelling deleted ports as deleted (bug again not trying to spend the effort to go back in time, just start marking them as deleted as time goes by), so that when someone searches for a port that used to be in MacPorts, the app would know about that port and point to the commit which deleted it, hopefully explaining why it was deleted, and it would be easier to find old code and potentially resurrect a port, as it recently happened to Poedit, for example.

As for the last point of supporting the fact that different OSes might provide different variants, I guess that we can safely ignore that for now and only implement it if there is too much time left at the end. It doesn't seem really important.

I have mixed feelings about supporting the fact that a certain port cannot be compiled on one particular OS version, or that a particular OS needs a different version of that port. This would be a totally useful feature, but of course not nearly as important as having the rest of the app working. We could get the information about different versions of a port on different OSes by submitting portindex[.json] for each supported OS and take those differences into account. Knowing whether an OS version is currently supported depends on https://trac.macports.org/ticket/15712.

ryandesign commented 6 years ago

support history of port changes

I worked on this problem for my 2013–14 MacPorts web app attempt. I did have a script that processed repository commits from the beginning of time to add them to the database. My intention was to be able to keep track of deleted ports, as you mentioned, and also to keep track of when a port was updated to a new version. Not only does this let us have a timeline of a port on that port's page, we can also have a timeline of all ports on the front page of the web site. If a user visits our site and sees a list of the 20 most recently updated ports, and they were all last updated today, that clearly indicates to the user that MacPorts is an active project.

This is also one possible solution to the problem mentioned elsewhere, that of incrementally updating the database, instead of re-importing the entire database for any change.

However, you're correct that it's difficult to implement. I quickly realized that portfile syntax has changed over the years. Some options were renamed (extract_sufx became extract.sufx which became extract.suffix), some options were removed (contents, cd), so today's MacPorts can't parse some of those old portfiles. One possible solution is for the script to replace old names with new ones in the portfiles that it's importing, but correctly identifying, for example, instances of the string cd that refer to the removed Tcl command cd, while skipping instances that refer to the shell command cd, is difficult. The solution I used instead is to create a Tcl library that is loaded into the interpreter to provide compatibility implementations of everything that has been removed from MacPorts over the years. It worked, but I don't remember if it successfully imported the entire repository history as of 2014.

One minor additional issue is that sometimes people inadvertently commit syntax errors. I added a list of known-bad revisions to my import script so that I could ignore the resulting import errors.

mojca commented 6 years ago

@ryandesign, you raised some interesting points. It would be cool to support showing a list of "recently updated / added / deleted" ports, but I would then implement this with a slight "hack" which would only record when the port was last updated, rather than recording full history of maintainer changes, variant changes, dependency changes, version changes, path changes, URL changes, ... We could record two things:

I agree that having a complete history would be nice, but I feel it goes out of scope of this project. It could be an extension goal, a future GSOC or hobby project, etc. What I would like to avoid is having a super complicated database design to start with and then not having the project implemented to a working state at all.

As a summary:

ryandesign commented 6 years ago

I agree, history can be added later.

My version of the project used Subversion, since we weren't on Git at the time. Using the Git commit timestamp would match the timestamp that was available with Subversion.