macports / upt-macports

The Universal Packaging Tool (upt) backend for MacPorts
https://framagit.org/upt/upt
BSD 3-Clause "New" or "Revised" License
8 stars 12 forks source link

Update Feature #44

Open Korusuke opened 5 years ago

Korusuke commented 5 years ago

So after trying a bunch of things like diff-match-patch, normal diff and vaguely trying regex following is the technique that works best at least for situations that I tried.

Let's say we have a Portfile which we need to update and we have decided that we will update version and checksums field. So first we extract that fields from existing Portfile (yet to be automated). and make a file which looks like this:

version             0.7

checksums           sha256  7de9a0cc9d55c175b611ed748ee7ea572ec308411934769e2e647d1d26ba2eea \
                    rmd160  6dc1b42ac5760f8225c81b1cd9539332b33b1b4h \
                    size    24223

and then we generate a new file which looks like this:

version             0.9

checksums           sha256  b967cda25e8e5a634a450886e98ebeaf5aa51220e8888a17a34d78d001f96035 \
                    rmd160  aac9b9c2f52ac75eec0700ac996a87f0151923f9 \
                    size    25969

Notice the blank line at end of file, that is apparently important for this to work.

Now we generate a diff for above two files and then apply patch on the original file. This even updates a file that maybe modified and has different order of fields than what upt usually has.

For example from this:

# -*- coding: utf-8; mode: tcl; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- vim:fenc=utf-8:ft=tcl:et:sw=4:ts=4:sts=4

PortSystem          1.0
PortGroup           python 1.0

name                py-upt
version             0.7
revision            0

maintainers         {@korusuke somaiya.edu:karan.sheth} openmaintainer
description         Package software from any package manager to any distribution
long_description    ${description}

platforms           darwin
supported_archs     noarch

homepage            https://framagit.org/upt/upt
master_sites        pypi:[string index ${python.rootname} 0]/${python.rootname}
distname            ${python.rootname}-${version}

license             BSD
checksums           sha256  7de9a0cc9d55c175b611ed748ee7ea572ec308411934769e2e647d1d26ba2eea \
                    rmd160  6dc1b42ac5760f8225c81b1cd9539332b33b1b4h \
                    size    24223

python.versions     37

if {${name} ne ${subport}} {
    depends_lib-append \
                    port:py${python.version}-spdx-lookup \
                    port:py${python.version}-setuptools

    depends_run-append \
                    port:py${python.version}-upt-macports \
                    port:py${python.version}-upt-cpan \
                    port:py${python.version}-upt-pypi

    test.run        yes
    test.cmd        ${python.bin} -m unittest
    test.target
    test.env        PYTHONPATH=${worksrcpath}/build/lib

    post-destroot {
        set docdir ${prefix}/share/doc/${subport}
        xinstall -d ${destroot}${docdir}
        xinstall -m 0644 -W ${worksrcpath} README.md LICENSE CHANGELOG \
            ${destroot}${docdir}
    }

    livecheck.type  none
}

to this:

# -*- coding: utf-8; mode: tcl; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- vim:fenc=utf-8:ft=tcl:et:sw=4:ts=4:sts=4

PortSystem          1.0
PortGroup           python 1.0

name                py-upt
version             0.9
revision            0

maintainers         {@korusuke somaiya.edu:karan.sheth} openmaintainer
description         Package software from any package manager to any distribution
long_description    ${description}

platforms           darwin
supported_archs     noarch

homepage            https://framagit.org/upt/upt
master_sites        pypi:[string index ${python.rootname} 0]/${python.rootname}
distname            ${python.rootname}-${version}

license             BSD
checksums           sha256  b967cda25e8e5a634a450886e98ebeaf5aa51220e8888a17a34d78d001f96035 \
                    rmd160  aac9b9c2f52ac75eec0700ac996a87f0151923f9 \
                    size    25969

python.versions     37

if {${name} ne ${subport}} {
    depends_lib-append \
                    port:py${python.version}-spdx-lookup \
                    port:py${python.version}-setuptools

    depends_run-append \
                    port:py${python.version}-upt-macports \
                    port:py${python.version}-upt-cpan \
                    port:py${python.version}-upt-pypi

    test.run        yes
    test.cmd        ${python.bin} -m unittest
    test.target
    test.env        PYTHONPATH=${worksrcpath}/build/lib

    post-destroot {
        set docdir ${prefix}/share/doc/${subport}
        xinstall -d ${destroot}${docdir}
        xinstall -m 0644 -W ${worksrcpath} README.md LICENSE CHANGELOG \
            ${destroot}${docdir}
    }

    livecheck.type  none
}
mojca commented 5 years ago

CC-ing also @satraul.

If you plan to update just the two fields, there's also his new port bump tool available. What I would like to see, if possible, would also be the "diff" of dependencies between the existing and the new version (I hope one could get that list from pypi?). Maybe it would be sufficient to output just some kind of log for now, since the information is already pretty unreliable.

Korusuke commented 5 years ago

So after thinking about how the interface between upt and backend will be like and what will go where following is what I have come up with:

Frontend parses upt -> Backend reads the existing file -> Backend extracts information to be updated (example for macports for now it can be version line and checksums line)....When I say extract means not the value but just the lines itself as we don't need the exact info as demonstrated above -> Backend generates update file using a update template -> upt generates patch file using both of this -> upt passes patch file, diff file, new file to backend -> Backend saves required files.

So in psuedo code something like: upt-core: (Sorry about the naming)

class Backend(object):
    def update_package(self, path_to_file, pkg):
        versions = self.package_versions(self.name)
        if versions.latest == self.version:
            return
        old = self.read_old_file(path_to_file)
        to_update = self.extract_update(old)
        updated = self.extract_update(pkg)
        file_diff = diff(old, pkg)  # os diff call here
        patch = diff(to_update, updated)  # os diff call here
        new = apply_patch(patch)  # os patch call here
        self.save_update(patch, file_diff, new)

Backend will have to implement : extract_update() read_old_file()(this can be done in upt-core but I am not sure about it) save_update upt-core cannot handle this due to naming preference of different backend

and also this whole function can be overridden by backends in case they want to do something different, for example as mojca suggested for MacPorts we can use port bump tool which is already doing the same job using regex and in an interactive manner.

What do you think about this?

reneeotten commented 5 years ago

I think the general outline looks fine; at least from my perspective, but I'll defer to @Steap for everything touching upt itself and about what should be done in the core/backend.

Ideally, I would prefer to skip all the intermediate "extract_update" stuff and just do

  1. read_old_file
  2. generate_file_by_upt
  3. make_diff_or_patch_directly_whatever_fields_we_want

but again, I don't know how feasible that is. I'll need to play around with some of the tools like diff-match-patch a bit more (soon-ish I hope) and see how that works.

Steap commented 5 years ago

Hello,

On 2019-07-11 19:33, Karan Sheth wrote:

So after thinking about how the interface between upt and backend will be like and what will go where following is what I have come up with:

Frontend parses upt -> Backend reads the existing file -> Backend extracts information to be updated (example for macports for now it can be version line and checksums line)....When I say extract means not the value but just the lines itself as we don't need the exact info as demonstrated above -> Backend generates update file using a update template ->

So basically these are the small files that only contain (for now) "versions" and "checksums", right?

upt generates patch file using both of this -> upt passes patch file, diff file, new file to backend -> Backend saves required files.

This honestly seems overly complicated. We need to extract data from an existing Portfile, do some work "on the side", re-inject everything into the backend and have it handle a merge.

It seems difficult to implement (for MacPorts, let alone other backends) and error-prone.

Or maybe I'm misunderstanding something?

Steap commented 5 years ago

If you plan to update just the two fields, there's also his new port bump tool available.

What can port bump do, exactly?

What I would like to see, if possible, would also be the "diff" of dependencies between the existing and the new version (I hope one could get that list from pypi?). Maybe it would be sufficient to output just some kind of log for now, since the information is already pretty unreliable.

So, my initial idea for update was to parse CPAN/PyPI/Rubygems/etc. for 2 different versions and generate a high-level diff ("these are the new requirements", "these requirements are no longer needed", etc.). At first, it seemed doable for PyPI:

https://pypi.org/pypi/requests/0.10.0/json <- an old version https://pypi.org/pypi/requests/json <- the current version

It seemed doable for CPAN as well:

https://fastapi.metacpan.org/v1/release/NEILB/AppConfig-1.70 <- an old version https://fastapi.metacpan.org/v1/release/AppConfig <- the current version

But I do not think Rubygems (and other frontends to be added in the future) will provide info about all versions of a package.

Also, it should be noted that some info is missing from these REST API. For instance, the test requirements for "requests" do not show up on PyPI (for various reasons that I won't go into here).

Wouldn't the best way to update just be to create a "Portfile.new" file, and then have the user run diff/git diff/whatever?

mojca commented 5 years ago

port bump basically does just the following (for any given port):

I wrote this and forgot to hit send before I read your replies:

I just want to reiterate: updating just port version and checksums is something that already port bump has done (testing and feedback greatly appreciated), and is relatively easy to do in general, but of course you miss a whole bunch of stuff.

What I would really like to see is at least some rudimentary support to tell you which dependencies changed between the two versions. I don't mind as much if I need to add those manually for a moment, but figuring out the exact dependencies is really the most time-consuming step which I very strictly :) always ignore when I need to do some mass update. We should be able to extract both old and new dependencies from the two versions, compare them, and at least tell the user:

Korusuke commented 5 years ago

But I do not think Rubygems (and other frontends to be added in the future) will provide info about all versions of a package.

The v2 of rubygems api seems to support this: https://rubygems.org/api/v2/rubygems/acts_as_list/versions/0.9.18.json https://rubygems.org/api/v2/rubygems/acts_as_list/versions/0.9.19.json

But yeah you are right maybe all frontends doesn't support this.

I agree that the whole process is a bit complicated so maybe we can divide that into two parts and do the following:

Now optionally backends can implement apply_update feature which is backend specific:

Is this fine?

reneeotten commented 5 years ago

I agree that Cyril initial idea to "generate a high-level diff" would be a good one, but indeed probably not doable for all frontends. Generating a new Portfile can and should be done for sure, in addition to making a diff between the two files; no problem there.

I also like Karan's idea of implementing an optional "apply_update" function (or whatever name) in the backend to do something more specific. I am not a fan of doing port bump for the MacPorts backend, that might for now work do update the version and checksums but is not scalable to anything more complicated for dependencies and such. So I think we need to come up with a way to "swap" the information into the new Portfile and should try to get that to work for simple things like "version/checksums" first and once that works it should be possible to extent.

Again, I don't like making separate intermediate diff files to patch the Portfile. This should be something people have done before and I would suggest you to investigate things like diff-match-patch or other approaches already being used.

@Steap can we first agree on how/what should be implemented in upt and what should be done in the backend and take it from there?

Steap commented 5 years ago

So, here is an updated (pun indented) idea of how to do things:

1) We get the current version of PKG using the recently introduced backend.package_versions() method. The current version is probably the "highest" version of the returned list.

2) We ask the frontend to parse both the "old version" and the "latest version".

3) We create a "package diff":

class PackageDiff(object):
    def __init__(self, old_pkg:upt.Package, new_pkg:upt.Package):
        self.version = new_pkg.version

    def new_requirements(self):
        pass

    def deleted_requirements(self):
        pass

upt.PackageDiff.new_requirements would be similar to upt.Package.requirements, but would only return the new requirements.

TODO: figure out what else we might want to access from a PackageDiff.

4) upt.Backend has a "create_package" method, it should have an update package as well.

class MacPortsBackend(upt.Backend):
    def update_package(self, pkgdiff:upt.PackageDiff):
        ...

Now, it's up to the backend to do the right thing. Can the backend easily make the required modifications? If not, maybe the abstraction I've proposed is not really good.

Korusuke commented 5 years ago

Sounds great, I have a few doubts(not even sure they are doubts)...so I will first code this up and if anything goes wrong then I will ask.

Korusuke commented 5 years ago

Ok so all of what @Steap suggested is done more or less...now before coming to upt-macports. Till now I haven't used file_path input which we had discussed to take as input for --update.

For upt-macports backend, this are the things I think we will do:

Anything else that you think we should be doing?

reneeotten commented 5 years ago

@Steap I fully agree with your latest proposal! I don't understand though how this differs from you first "high-level diff" that seemed not supported by all the frontends. Anyway, if you think this will work I think it's a great way of doing this, so let's go for it!

@Korusuke as I said earlier I would prefer not to do port bump on the existing Portfile. This will only work for version and checksums and not for anything else. This is a major, and very important, part of your GSoC project for MacPorts and if it takes time to find a way to do this, this is time well spent.

Of course, we could decide to finalize the pieces needed for upt and start with what you suggested above as an intermediate solution. Sometimes the drawback of such an approach is though that the ideal solution might never get implemented...

Korusuke commented 5 years ago

Ok my bad, I somehow tested diff match patch wrongly or missed out..... but testing again with all the output that current idea is giving, it works unbelievably amazingly well :) Will push the code tomorrow.

Steap commented 5 years ago

@reneeotten Yes, this is the same "high-level diff" approach. If some frontends do not give us access to all versions of a package, well, too bad. We could always ask the backend to parse their own files if needed, by the way.

Maybe we could improve "port bump" instead of implementing everything in upt-macports? I'm not sure exactly how "port bump" works, though.

reneeotten commented 5 years ago

Maybe we could improve "port bump" instead of implementing everything in upt-macports? I'm not sure exactly how "port bump" works, though.

Perhaps, but it's written in Tcl, which is pretty much voodoo to me ;) Looking at the man-page it will only update the checksums and reset the revision, after updating the version number of a port by the maintainer.

If we first need to update the version number and then run port bump I see no use-case for this when using upt, it appears to me that we have all the required information ourselves already.

satraul commented 5 years ago

If we first need to update the version number and then run port bump I see no use-case for this when using upt, it appears to me that we have all the required information ourselves already.

Hi all. I haven't been following the discussion closely, but it is already planned for me to implement a --livecheck argument for port bump so the version number is updated automatically to the latest version before bumping. Issue here: https://github.com/satraul/macports-base/issues/4

Is this the use case needed? I can prioritize working on this feature if so.

mojca commented 5 years ago

Maybe we could improve "port bump" instead of implementing everything in upt-macports? I'm not sure exactly how "port bump" works, though.

Perhaps, but it's written in Tcl, which is pretty much voodoo to me ;)

We have the GSOC student who implement this as a challenge task during proposal submissions (@satraul) and could improve the tool if needed.

Looking at the man-page it will only update the checksums and reset the revision, after updating the version number of a port by the maintainer.

True.

If we first need to update the version number and then run port bump I see no use-case for this when using upt, it appears to me that we have all the required information ourselves already.

True.

What port bump does is in fact merely replacing checksums, resetting revision (I forgot about that part) and updating version (I thought this was already implemented, but according to @satraul's response maybe not yet, even though it would be doable).

Maybe my message was misunderstood before / I was not clear.

What port bump does is trivial to replicate with upt at this stage, while at the same time port bump has absolutely no chance of knowing what dependencies changed. And I consider the latter (at least notifying the user what dependencies need to be added or removed) as one of the most important goals of the upt project. So all in all I see no reason for upt to depend on port bump, it's easier and far more flexible to implement everything that needs to be done in python.

port bump remains a very useful tool for other ports with no external meta information available. For python/ruby/perl we should aim higher though.

@satraul: yes, updating version would be nice, or support something like port bump maintainer:mojca :), but there is no need to re-prioritise your tasks. Just concentrate on your main task for now and you may do this when you need a short break from fighting with xcode, or once your major milestone is achieved.

What would also be cool as the "last step" for both tools (upt and port bump) would be the ability to automatically open pull requests, but that requires a bit more work, and in any case if that ever gets implemented, it should be shared for both tools.

reneeotten commented 5 years ago

port bump remains a very useful tool for other ports with no external meta information available. For python/ruby/perl we should aim higher though.

Agreed that it's useful for other ports, but as I've been trying to say for a while now it should not be used here. We have the same information (and much more) easily available from upt and should implement a proper solution. But again, it sounds like @Korusuke has possibly found such a solution already using diff-match-patch so let's see how that goes!

We can then figure out which part of the code should go (as default option for example) into upt and what is backend-specific.

I would like to see, at least in the testing phase, the following:

  1. copy the existing Portfile to Portfile.old
  2. generate a Portfile with upt as if it was a new package (--> Portfile.upt)
  3. do the magic of integrating/merging the changes (and possibly adding missing fields) and write that to Portfile
Korusuke commented 5 years ago

Version for perl ports in macports doesn't match that of upstream, if the upstream version is 1.41 then version in macports is 1.410.0. This is consistent for atleast the random 10 ports that I checked, so I guess we should just handle it in upt-macports, or is it something that will be fixed in macports itself ?

Edit: Nope I was wrong, it's weird as in Portfile the version is right but port info --version returns wrong version and that too the returned version is inconsistent(what I have described above is wrong for some ports).

App-rad: version is 1.05 port info gives 1.50.0

So I guess we should fix port info first as this is important for both update as well as recursive feature to work.

What do you think ?

Korusuke commented 5 years ago

Edited a bunch of stuff in previous comment and as the github mail won't contain that so ping :) @reneeotten @mojca @Steap

jmroot commented 5 years ago

Version for perl ports in macports doesn't match that of upstream, if the upstream version is 1.41 then version in macports is 1.410.0.

Version numbers in perl are... interesting. The observation above is only true for upstream modules that use a decimal number for their version. Our version comparison doesn't support such numbers, e.g. 1.10 is considered bigger than 1.2. The perl5.setup proc transforms the version in a way that should match what is done by perl -Mversion -e 'print version->parse("SOMEVERSION")->normal'.

Further reading: https://metacpan.org/pod/version

reneeotten commented 5 years ago

thanks for the explanation @jmroot

In that case we should do the same transformation of the upstream version in upt_macports.py before doing checking whether the correct version is already packaged in MacPorts or if there is a newer version available upstream. @Korusuke let's transform the proc perl5_convert_version in "_resources/port1.0/group/perl5-1.0.tcl" into a Python function.

Korusuke commented 5 years ago

thanks for the explanation @jmroot

In that case we should do the same transformation of the upstream version in upt_macports.py before doing checking whether the correct version is already packaged in MacPorts or if there is a newer version available upstream. @Korusuke let's transform the proc perl5_convert_version in "_resources/port1.0/group/perl5-1.0.tcl" into a Python function.

The check about version checking actually takes place in upt-core and not in upt-macports which is only responsible of returning the current version. So we actually need to apply inverse transformation and return that to upt-core.

For doing this although I was highly skeptic about it working, I replicated the existing proc in python and then tried to invert it which "kind of wokrs" but not always. Consider the following cases: -Upstream version: 1.23 -Parsed version: 1.230.0 -Inverse version(my script): 1.23

-Upstream version: 1.2.3 -Parsed version: 1.2.3 -Inverse version(my script): 1.002003

So basically if the original upstream version has two decimal points then we don't have to apply any transformation.

I am kinda stuck and don't think there's any way to correct this script. But here are potential solution's:

reneeotten commented 5 years ago

@Korusuke I looked a bit more today and I tend to agree with you that reverse engineering the upstream package version is difficult to do correctly. For example, all these "Portfile" versions [1.1, 1.10, 1.100] would give the same "Perl converted" version (v1.100.0) so it's pretty much impossible to get back to the upstream version from port info --version <perl-port>.

Ideally, I prefer not overwrite the needs_requirement() function that is present in upt, unless really necessary. Additionally, I suspect that other backends could have similar issues when comparing Perl versions (especially if a package switches between versions with/without the v).

It seems to me that the recommended way of comparing Perl package versions is to first convert them as described in the link that @jmroot gave above. If that's the case, what about putting that logic in upt? In other words, why don't we convert both the upstream version as well as the version(s) returned by the backend and then do the comparison? That would in my opinion solve the issue, correct? The slightly annoying thing is then that there would be some cpan specific code in there, which might not be ideal... @Steap what do you think?

Steap commented 5 years ago

I am horrified. I really wonder how Perl ended up with something this complicated just to specify a version.

I do not really understand why needs_requirement would have anything to do with this. Can you tell me more about what you had in mind?

reneeotten commented 5 years ago

I do not really understand why needs_requirement would have anything to do with this. Can you tell me more about what you had in mind?

Sure, there might be other/better ways and it's possible that I'm far off here, but.... my thought process this afternoon and how needs_requirement would fit in here is below:

For example, upstream has its version set to “1.04” so this is what upt will get and also what will be in the MacPorts portfile. According to the recommended Perl conversion (and also what port info --version gives) this version number will become 1.40.0. (We skip the v in MacPorts versions, but that's fine because if you run the conversion again it gives the same result with the v).

Now, let’s assume this package in MacPorts becomes outdated and version “1.05” is released and we attempt to package another package that requires this version. Currently, in upt_macports.py in package_version() we return the version that is present in MacPorts (using port info --version <portname>). So we return that we have version “1.40.0” packaged and only version 1.0.5 is required by the thing we are attempting to package. So upt will think everything is good in the packaging-world when executing needs_requirement(), whereas in reality the version we have is not sufficient (this will happen now already with the "recursive feature", but certainly also when deciding a package needs to be updated). It's possible that other backends specify their versions exactly as upstream and as an alternative solution we could just extract the upstream version from the Portfile and return that instead.

However, it seems to me that similar issues can appear as well if requirements and upstream versions are expressed differently. From the link mentioned above: “If you need to compare version numbers, but can't be sure whether they are expressed as numbers, strings, v-strings or version objects, then you should use version.pm to parse them all into objects for comparison.”

So what I am proposing is to add a Python function to upt that does the recommended conversion (i.e., gives the same output as “perl -Mversion -e 'print version->parse()->normal’) both for version specifiers that are returned by the backend (through their package_version() and the requirement specifier in needs_requirement() in upt.

In other words, in pseudo-code something along those lines:

def standardize_perl_version(version):
    “”” return the same output as “perl -Mversion -e 'print version>parse(version)->normal’) “””
    std_perl_ver = <some magic code to do the conversion>
    return std_perl_ver

in the needs_requirement() function:

if frontend == ‘cpan’:
    versions = standardize_perl_version(self.package_versions(req.name))
else:
    versions = self.package_versions(req.name)

and a little further down:

if frontend == ‘cpan’:
    s = SpecifierSet(standardize_perl_version(req.specifier))
else:
    s = SpecifierSet(req.specifier) 
compatible_versions = list(s.filter(versions))

Apologies for the very long post.... but it seems to me this solves the issue for MacPorts and at the same time will work in a backwards-compatible way for all backends. As an additional bonus it should help for all backends when version are expressed differently (as described above). Let me know what you think and I am certainly open for other, simpler solutions ;)

Steap commented 5 years ago

So this is also an issue for recursive packaging, right?

Rather than having a bunch of if/else in needs_requirement, one solution would be to "fix" upt-cpan, and always use "normalized" versions. I'm not sure exactly what other backends expect, so I'm a bit reluctant to make this change right now.

A workaround for macports would be to override need_requirement in this way:

def need_requirement(self, req, phase):
    new_req = ... # modify the requirement to have a "standardized" version" if needed

    super().need_requirement(new_req, phase)

This way you only implement the "magic" conversion (which may or may not be moved to upt-cpan later) and still take advantage of the regular need_requirement version.

What do you think?

reneeotten commented 5 years ago

@Steap long-term it might be indeed worthwhile to move to the normalized/standardized version in upt-cpan but for now that will likely have a too big impact on other backends.

Your suggestions is a very good one, I should have thought about using super()... So we extract the value of the version from req.specifier , do the MacPorts/Perl-normalization on it and put it back (keeping the "<", ">", "==", .... elements intact). For frontends other than cpan, we don't need to do anything. Yeah, that should work I think (except that you're missing the s in the function name --> ``needs_requirement```).

@Korusuke does the discussion above and the suggested solution makes sense to you - if so, can you give this a try. I think we should implement this in a separate PR; if we can resolve this for the already merged "recursive packaging feature", it should also work for the "update feature".