macports / upt-macports

The Universal Packaging Tool (upt) backend for MacPorts
https://framagit.org/upt/upt
BSD 3-Clause "New" or "Revised" License
9 stars 12 forks source link

[WIP] Update Feature #45

Closed Korusuke closed 2 years ago

Korusuke commented 5 years ago

Ready for testing, finally!! Commits from recursive packaging are also visible here since it uses some of its code. The commits may be a bit messed up, if that's the case I'll fix it in the end.

I would say test the code and not review it as of now as the following things are still left to do:

For testing this you will need to use the following:

Example command's: Write file to output: upt package -f pypi -b macports requests-mock -u ports -o ports

here ports is my folder name that contains all the portfiles in their respective folder.

Without write to file: upt package -f pypi -b macports requests-mock -u ports Even when not using output to file we need to supply "ports", will fix this as I missed this is an issue until writing this.

Known Issues:

I hope I didn't miss anything that was in my mind :)

reneeotten commented 5 years ago

As you mentioned yourself as well, the --update should have no arguments (just as for the --recursive flag).

I think I installed the versions you mentioned above, but I get the error below. I haven't looked any further, but can you please try to see first if this happens for you as well?

>>> upt package -f pypi -b macports pytest -u ports
py-pytest
[INFO    ] [Backend] Checking MacPorts tree for port py-pytest
[INFO    ] [Backend] Current MacPorts Version for py-pytest is 4.6.3
Traceback (most recent call last):
  File "/tmp/test-upt/bin/upt", line 11, in <module>
    load_entry_point('upt==0.9', 'console_scripts', 'upt')()
  File "/tmp/test-upt/lib/python3.7/site-packages/upt-0.9-py3.7.egg/upt/upt.py", line 397, in main
  File "/tmp/test-upt/lib/python3.7/site-packages/upt-0.9-py3.7.egg/upt/upt.py", line 327, in package
  File "/tmp/test-upt/lib/python3.7/site-packages/upt_macports/upt_macports.py", line 300, in update_package
    packager.update_package(diff, old_pkg, new_pkg, path, output)
  File "/tmp/test-upt/lib/python3.7/site-packages/upt_macports/upt_macports.py", line 28, in update_package
    print(diff.new_requirements())
  File "/tmp/test-upt/lib/python3.7/site-packages/upt-0.9-py3.7.egg/upt/upt.py", line 229, in new_requirements
AttributeError: 'PyPIPackage' object has no attribute 'requirement'
Korusuke commented 5 years ago

I think I installed the versions you mentioned above, but I get the error below. I haven't looked any further, but can you please try to see first if this happens for you as well?


>>> upt package -f pypi -b macports pytest -u ports

Nope I don't get any error, this "exact" thing is working for me.

Korusuke commented 5 years ago

As you mentioned yourself as well, the --update should have no arguments (just as for the --recursive flag).

So you mean we should directly get the existing portfile from github, right ?

reneeotten commented 5 years ago

So you mean we should directly get the existing portfile from github, right ?

Sorry, you are completely correct - ignore me ;)

Of course we need to specify where the "current" Portfile needs to come from, you did the correct thing! Your comment

Even when not using output to file we need to supply "ports", will fix this as I missed this is an issue until writing this.

is then a little confusing, as this is actually how it should be, correct?

I will give this a try again tomorrow and will try to give more useful comments then ;)

Korusuke commented 5 years ago

I think I installed the versions you mentioned above, but I get the error below. I haven't looked any further, but can you please try to see first if this happens for you as well?

>>> upt package -f pypi -b macports pytest -u ports

Nope I don't get any error, this "exact" thing is working for me.

Sorry, I tried installing everything again and it gives me the same error. I probably pushed the wrong code. Will fix in few min.

Korusuke commented 5 years ago

Will "fix" this tests once it's finalized that this is the right approach to go.

Korusuke commented 5 years ago

Did anyone get a chance to try this out?

reneeotten commented 5 years ago

I did try this out for a little bit and it seems to work okay-ish, for the ports I have tried but I will continue this a bit.

A few points:

  1. specifying the "--output" with a directory doesn't do anything, it will use the directory that is specified for the "update" argument. In a way, this makes sense since you're asking to update the port; but then it should not require an argument after "output" anymore. If we decide to allow the user to write the output to a different directory, then this should be fixed ;)

  2. should this be part of the "package" command in upt or was the idea to make a separate "update" command?

  3. the revision should always be reset to 0 when updating a port, so we'll need to enforce that

  4. I think it's a smart idea to generate a diff between the Portfiles generated by upt for the latest and previously present version. I also think this will work correctly if upt get's the dependencies correct for the two version (in my example for pytest that isn't the case). But that's because the "specifiers" have changed and currently upt-pypi is ignoring those so it thinks that py-more-itertools is a new dependency, which isn't the case and the maintainer had added that one already in the existing Portfile. But that's probably something we cannot completely solve automatically.

  5. I think something we can/should add to the documentation is that our suggested workflow is to first "pacakge" once using upt ("upt package -f -b macports ") to get the default Portfile layout, and then manually incorporate the changes from the "current", maintained Portfile. After that one should be able to use the "update" feature to keep everything in sync. Alternatively, one could use "update" feature but then the first time one should expect to need to do some manual adjustments. I have no problems with that at all!

Known Issues:

  • Checksum swaping is bit wrong when the order of checksums is different then what it is by upt. Yes, ideally the order shouldn't matter for making the diff but right now it does... I also noticed that in our base.template we actually don't follow the order of checksums as listed in the guide; there it is
    checksums     rmd160  ??? \
    sha256  ??? \
    size    ???

so let's change that in our template; pretty much ever (newer) Perl/Python port follows that convention as well. Of course, it would be great if we can get this working correctly irrespective of the order...

Steap commented 5 years ago

1) I would much rather use "--output". It should be able to look for ${output}/${macports-name}/Portfile. Let's not add arguments to the CLI.

2) It would be nice if it could be just "-u/--update": we could remove subcommands and have shorter commands. A common complaint about upt is that there is a lot of typing involved.

3) OK

4) I like the idea, but I wonder how well it is going to work. What I had in mind with the PackageDiff idea was that the backends would do some sed-like magic. The idea of getting a patch and re-applying it is smart, but I'm afraid we're going to end up with conflicts. Can we run tests on a lot of packages and see whether we get failures?

What do you think?

reneeotten commented 5 years ago
1. I would much rather use "--output". It should be able to look for ${output}/${macports-name}/Portfile. Let's not add arguments to the CLI.

2. It would be nice if it could be just "-u/--update": we could remove subcommands and have shorter commands. A common complaint about upt is that there is a lot of typing involved.

I am fine with not adding subcommands ;) So what you suggest is have "-u/--update" as a flag (just as "-r/--recursive") and then use the (directory) argument after "--output" to specify both from where read the current Portfile and write the new one. Correct? That would work, except for the fact that one cannot try the update and have the new Portfile printed to stdout anymore (since it doesn't know where to read the current Portfile from) ...or? If needed we could use port file <portname>, which gives the path to the Portfile used by port.

4. I like the idea, but I wonder how well it is going to work. What I had in mind with the PackageDiff idea was that the backends would do some sed-like magic. The idea of getting a patch and re-applying it is smart, but I'm afraid we're going to end up with conflicts. Can we run tests on a lot of packages and see whether we get failures?

Ideally we would "solve" of all this within the Python code, but if you think the "sed-like magic" might work better we can go with that. Sure, we should be able to test this with a whole collection of Perl/Python/Ruby ports and see how well it performs. @Korusuke why don't you write a small script to randomly select let's say 100 ports from each category and run the "--update" command on them. You can save both the output from upt and look at the generated Portfiles to get an idea of how well this will work in practice.

Steap commented 5 years ago

I am fine with not adding subcommands ;) So what you suggest is have "-u/--update" as a flag (just as "-r/--recursive") and then use the (directory) argument after "--output" to specify both from where read the current Portfile and write the new one. Correct? That would work, except for the fact that one cannot try the update and have the new Portfile printed to stdout anymore (since it doesn't know where to read the current Portfile from) ...or? If needed we could use port file <portname>, which gives the path to the Portfile used by port.

What would be the use case here? You have a port in a Portfile, and you'd like to update it, but you'd rather not touch the Portfile, and print out the update to stdout instead? I don't really understand why you'd want to do this.

@Korusuke why don't you write a small script to randomly select let's say 100 ports from each category and run the "--update" command on them. You can save both the output from upt and look at the generated Portfiles to get an idea of how well this will work in practice.

Maybe start with "git checkout <commit from 2 years ago>". I really think we're going to end up in situations where the patch will not apply :/

reneeotten commented 5 years ago

What would be the use case here? You have a port in a Portfile, and you'd like to update it, but you'd rather not touch the Portfile, and print out the update to stdout instead? I don't really understand why you'd want to do this.

Right, if you phrase it like this... I agree that it's probably not all that useful ;) It would potentially be useful to check if it the update itself works correctly, but as long as we keep a backup of the original portfile around it's indeed not really necessary.

Korusuke commented 5 years ago

To try this out on port versions from 2 years back I used gitpython to extract and the first commit after 05/10/2016 and created a new folder of this 300 ports. (This took wayy long than expected as either gitpython is slow or I did it in some wrong way)

Now only thing left to try this out is to somehow "mod" port info --version to show the version of this ports instead of the macports-tree. Is there any way to do this @reneeotten ?

I guess adding the folder to path might do it...will try this in few min.

Steap commented 5 years ago

Right, if you phrase it like this... I agree that it's probably not all that useful ;) It would potentially be useful to check if it the update itself works correctly, but as long as we keep a backup of the original portfile around it's indeed not really necessary.

My idea was that maintainers would package "foo" by doing:

$ upt package -f pypi -o ~/macports/ foo

This would create ~/macports/py-foo/Portfile.

Then they would upgrade using the exact same command, only adding "-u":

$ upt package -f pypi -o ~/macports -u foo

This would make it easy to remember.

I think usually maintainers use a VCS, so if the upgrade goes wrong, they can easily rollback. What do you think?

To try this out on port versions from 2 years back I used gitpython to extract and the first commit after 05/10/2016 and created a new folder of this 300 ports. (This took wayy long than expected as either gitpython is slow or I did it in some wrong way)

Why didn't you just "git checkout "?

reneeotten commented 5 years ago

Now only thing left to try this out is to somehow "mod" port info --version to show the version of this ports instead of the macports-tree. Is there any way to do this @reneeotten ?

You have to setup your local port repository; see the guide.

reneeotten commented 5 years ago

@Steap I agree with your workflow, let's go for that implementation

Korusuke commented 5 years ago

You have to setup your local port repository; see the guide.

Great that works!,

I ran it on random 100 python ports out of which it failed for few packages cause of key error(to be fixed) in upt-core(update feature PR) for new-requirements.

I have just looked a bit in the generated ports as it's too late and I have college tomorrow, so will have a deeper look tomorrow. From what I gather it works only about 50% of time. but the bad thing is it fails many times to replace version and the other issue as mentioned before is checksum replacing.

I will generate a diff and share it with you tomorrow.

Korusuke commented 5 years ago

Updating the checksum layout made quite a difference....but sometimes it fails on version swap and I am not able to pinpoint the exact reason.

Instead of generating diff, I am just sharing the raw files: Link

And above all is just for pypi, rubygems in my view cannot be really tested same way given the current portfiles....and for cpan the version bug is yet not fixed.

As this is not giving too good results, any ideas to improve it ?

Steap commented 5 years ago

Updating the checksum layout made quite a difference....but sometimes it fails on version swap and I am not able to pinpoint the exact reason.

What is the issue exactly? Does the equivalent of "patch(1)" that is done using diff-match-patch fail?

Korusuke commented 5 years ago

What is the issue exactly? Does the equivalent of "patch(1)" that is done using diff-match-patch fail?

I initially thought that this would be the case but when I tried the whole process manually for few ports that failed using the online tool...It says OK.

Now currently we are generating patch b/w old upt file and old Portfile and applying it on new upt file. There is one more way which would be generating patch b/w old and new upt file and applying that to the old Portfile to generate New Portfile. I tried this out same ports as before and the result is similar but the version swap always works but checksums messes up sometime and yes in this the diff-match-patch says Fail (atleast sometimes, sometimes it says OK and messes up the Portfile more by pasting checksum anywhere). Example port: Portfile:

# -*- coding: utf-8; mode: tcl; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- vim:fenc=utf-8:ft=Portfile:et:sw=4:ts=4:sts=4

PortSystem          1.0
PortGroup           python 1.0

name                py-tmuxp
version             1.5.0a1
categories-append   devel
license             MIT
platforms           darwin
supported_archs     noarch

maintainers         {@egorenar posteo.net:egorenar-dev} openmaintainer

description         tmux session manager.
long_description    ${description}

homepage            https://github.com/tmux-python/tmuxp/
master_sites        pypi:t/${python.rootname}/
distname            ${python.rootname}-${version}
python.versions     27 36 37

if {${name} ne ${subport}} {
    depends_build-append    port:py${python.version}-setuptools
    depends_lib-append      port:py${python.version}-setuptools \
                            port:py${python.version}-colorama \
                            port:py${python.version}-click \
                            port:py${python.version}-kaptan \
                            port:py${python.version}-libtmux

    patchfiles          patch-requirements.diff

    checksums           rmd160  76767f98ff0b84921a5908ec049ef647cee4faa9 \
                        sha256  88b6ece3ff59a0882b5c5bff169cc4c1d688161fe61e5553b0a0802ff64b6da8 \
                        size    67218
    livecheck.type      none
} else {
    livecheck.type  pypi
}

In previous method the version swap fails on this and in new one only indentation of checksum is shifted.

Steap commented 5 years ago

@reneeotten Do you have any idea on how to improve the use of diff-match-patch in this case?

reneeotten commented 5 years ago

@reneeotten Do you have any idea on how to improve the use of diff-match-patch in this case?

Sorry I haven't had the time to really keep up with this - it's not something I am very familiar with and work/life has been busy... (and I first need to get some sleep now). I'm not sure that I fully understand @Korusuke his latest comment either, but will have to look more at this myself in the next few days to see what's going on and if there is a way to improve this.

reneeotten commented 5 years ago

@Korusuke please make sure you push all the code you have locally to the different repositories so we can actually test things in the same way you do.

Right now https://framagit.org/Korusuke/upt/tree/update is likely not up-to-date and stuff fails with TypeError: update_package() got an unexpected keyword argument 'output'. Of course the required changes in this specific case aren't that difficult to spot (line 326 in upt.py should become backend.update_package(diff, old_pkg, upt_pkg, args.output)), but it would be helpful if I don't need to make them myself again. More importantly, I only noticed this because of the failure and I don't know if you have made other modifications...

Korusuke commented 5 years ago

@Korusuke please make sure you push all the code you have locally to the different repositories so we can actually test things in the same way you do.

Sorry about this, I have now pushed the code. It also has few more changes as now only '-u -o ports' is used instead of old '-u ports -o ports'

Korusuke commented 5 years ago

Check this out now, it is much better than before. Now checksums will work for any order and support for md5 and other checksums can also be added for update specifically if needed.

It basically doesn't include checksums while rendering template and so because of which checksums no longer mess up the formatting or anything. Once diff match patch is applied we apply a bunch of replace function to swap in the new checksums and version(if diff-match-patch failed on this).

We can do this directly on the existing portfile without diff match patch but that way we don't get the new/deleted dependencies updation which works most of the time.

Steap commented 5 years ago

So we now have an update process that is half-clever (diff-match-patch based) half-"sed like" (the text replacement part). Also the template is now harder to read, and the whole thing really feels too "hacky". I must say I do not really like this :/

We can do this directly on the existing portfile without diff match patch but that way we don't get the new/deleted dependencies updation which works most of the time.

Users are going to scream if it works "most of the time". We need to list all possible cases, see which ones are handled well, which ones are not, and think about possible solutions.

Korusuke commented 5 years ago

So as it was getting hard to keep track and evaluate different iterations of the update feature and to know how much it is working and where it fails, I ran the major three ideas for all the ports and following are the results:

Out of 1072 py ports in git checkout -2years back, 826 were updateable(new update is available). When updating this 826 ports the upt failed for 200 ports and 626 were updated but it later turns out that few ports from these 626 were actually using github PortGroup but were also available on PyPi and hence upt ran on them. (Example of such ports- `bob-` ports )

So it failed on 200 Ports for either of the following reason:

List of ports which failed to update: Link

So now coming to the ports which did "successfully" update. The diff results for all methods - Link 14 ports failed on port lint.

As Method 3 builds on Method 1 it is better or same everywhere. Method 2 fails at many places so it kinda useless.

Method 3 works "everywhere" for version and checksums as that is ensured by the replace methods. In some files you may find that checksums are not replaced that is cause the existing checksums are from github and not PyPi so they are not replaced.

Other than that it works for most places but I am not able to pinpoint why it fails sometimes...though I won't call it total "failure" as with just a bit of editing the Portfile passes portlint.

I agree partially with @Steap on this that Method 3 is a bit Hacky but it works the best so far.

Any thoughts on how we can improve this or how should we move forward from here?

Steap commented 5 years ago

So as it was getting hard to keep track and evaluate different iterations of the update feature and to know how much it is working and where it fails, I ran the major three ideas for all the ports and following are the results:

So, could you sum up what these methods were?

Can we have an example of such a package?

  • Package not found by backend...now this is intresting as the latest version was found but while retriving the old version upt-pypi fails, for few ports that I checked for it was because it had either wrong version number in Macports or the path for archive was different than usual

If the version number in Macports is wrong, how is the package currently building? Can we have an example of such a package?

Can we have an example of a package for which the archive was different from usual?

List of ports which failed to update: Link

So now coming to the ports which did "successfully" update. The diff results for all methods - Link 14 ports failed on port lint.

As Method 3 builds on Method 1 it is better or same everywhere. Method 2 fails at many places so it kinda useless.

Method 3 works "everywhere" for version and checksums as that is ensured by the replace methods. In some files you may find that checksums are not replaced that is cause the existing checksums are from github and not PyPi so they are not replaced.

Other than that it works for most places but I am not able to pinpoint why it fails sometimes...though I won't call it total "failure" as with just a bit of editing the Portfile passes portlint.

I agree partially with @Steap on this that Method 3 is a bit Hacky but it works the best so far.

Looking at python-aplpy, I am under the impression that while updating the Portfile, you removed a few things that are not related to the update. This is going to be a problem for maintainers, imho. @reneeotten what do you think?

Korusuke commented 5 years ago

So, could you sum up what these methods were?

Errors of 200-Ports: Link

Example of Ports with Archive Error:

Some ports have error of Portfile not found and the common thing for this error is that the portname is different than usual and therefore the folder name is different too....so this is something to be fixed in upt-macports output feature.

If the version number in Macports is wrong, how is the package currently building? Can we have an example of such a package?

I checked 5-7 more ports and they all had their new versions available in PyPi but the old versions were somewhere else and hence the error.

For port biopython there is an error in guess_from_file as the LICENSE file was not found.

Looking at python-aplpy, I am under the impression that while updating the Portfile, you removed a few things that are not related to the update. This is going to be a problem for maintainers, imho. @reneeotten what do you think?

Yes agree, such errors shouldn't occur but seeing that they are rare and anyways maintainers have to check the files once generated they can always rollback to previous version and just take checksums and anything else required from Portfile.upt if the update goes wrong....Ideally this shouldn't be the case but for ports such as py-aplpy which have variants, they are complex and therefore the update fails.

One thing we can do is apply the manual patch before and save the file as that we know will be right except for the new/deleted dependency part and then if there are any changes in dependency only then we apply "diff-match-patch" and save it.

reneeotten commented 5 years ago

Looking at python-aplpy, I am under the impression that while updating the Portfile, you removed a few things that are not related to the update. This is going to be a problem for maintainers, imho. @reneeotten what do you think?

I haven't had the time to look at all these textfiles with failed ports or the differences between the different update methods. You've generated a lot of useful data here, but it's kind of hard to digest and not something that's happening effectively, at least not by me, at midnight...

The port specifically highlighted by @Steap is certainly an issue, our update feature can certainly not be removing half of the Portfile. I am afraid all of this is not heading in the right direction, but it's admittedly hard to say what would be the correct way to go...I am personally not convinced anymore about generating a diff between two upt-generated Portfiles and applying them to the current version.

The current Portfile that is in MacPorts and presumably carefully generated by the maintainer has to be the basis of the new Portfile and we should only swap/update fields in there that we know for sure that have to be changed. So in my mind that should only involve the current Portfile and an upt-generated Portfile of the new version. I have been trying out some things myself, but have not been very successfully yet either; unfortunately the number of hours per day are limited and therefore progress has been slow...

Korusuke commented 5 years ago

Added diff_files for the actual "ground_truth" update of all the ports, if some diff is not there then that would mean that the port was deleted(this is the case for bob-* ports and few others). Link is same as before.

The current Portfile that is in MacPorts and presumably carefully generated by the maintainer has to be the basis of the new Portfile and we should only swap/update fields in there that we know for sure that have to be changed.

I was thinking something similar as after seeing the "ground" diff files, we can see there are many changes that cannot be taken care by upt like change in style of writing maintainer(example) or deletion of livecheck(example).

Also I guess testing the update feature on versions that are 2 years old is not exactly fair as the style of writing checksums have changed a bit from then so maybe that's the reason for checksum update errors.

What we can do is just update version, revision, checksums and maybe also description and homepage. As for it goes for the dependencies the change is dependencies is not that common and we are anyways printing changed dependency(we can also add a comment to Portfile incase there is a change in dependency but not actually add those). If the maintainer see it fit to add/delete those dependencies than it can be done manually.

Steap commented 5 years ago

Also I guess testing the update feature on versions that are 2 years old is not exactly fair as the style of writing checksums have changed a bit from then so maybe that's the reason for checksum update errors.

It's not super fair, but it makes it easy to see the issues that may arise when an update is a tiny bit complex. You really do not want to overwrite comments and manual work of a maintainer.

What we can do is just update version, revision, checksums and maybe also description and homepage. As for it goes for the dependencies the change is dependencies is not that common and we are anyways printing changed dependency(we can also add a comment to Portfile incase there is a change in dependency but not actually add those). If the maintainer see it fit to add/delete those dependencies than it can be done manually.

We probably want version, revision and checksums. This should be as easy as reading the manual of the "re" module :)

Regarding dependencies, there are things that might be done. I guess it's up to @reneeotten, @mojca and all the other MacPorts developers to tell you what they'd consider acceptable.

I am personally not convinced anymore about generating a diff between two upt-generated Portfiles and applying them to the current version.

Yes, it is an elegant idea but I feel like it's not really going to work :/

Korusuke commented 5 years ago

How are we moving forward with this now?...should I generate updates ports using just re and try install on them to see how many fail?

reneeotten commented 5 years ago

How are we moving forward with this now?...should I generate updates ports using just re and try install on them to see how many fail?

It seems the more ambitious goal is to provide a full, automatic update of the Portfile is -at least for now- not feasible. So let's just get to a state where the basics work using re and then we can always try to improve from there! The bare minimum we need is:

  1. update of the version
  2. update of the checksums
  3. re-setting of the revision to 0 (or if not present, adding this line)

Only logging the change in dependency is probably not sufficient, especially if you're doing the update recursively and possibly get many messages on your screen. So I agree that we should add comments to the end of the Portfile for such changes; we can include dependencies there, possibly updated homepage, ....? That would give the maintainer a quick overview of what, according to upt, has changed between the "old" and "new" versions and leave it to him/her to take/ignore these changes.

Of course it's fine to generate updates/diffs for all ports, but please do summarize the results and point to specific issues as it's unlikely that anyone will actually go through all of them.

Steap commented 5 years ago

Since we're switching strategies, could you squash all of this and show us a single, minimal patch?