pbrisbin / aurget

A simple pacman-like interface to the AUR
GNU General Public License v2.0
69 stars 17 forks source link

fix: Replace parsing JSON with awk by jq #81

Closed aragon999 closed 2 years ago

aragon999 commented 2 years ago

apparently the structure of the API response has changed, such that the current awk script produces the wrong result. In general since JSON does not enforces a structure this error could come back, thus instead of fixing the script we replace it by jq which is a new dependency.

Note: Currently my local tests failed but due to the following diffs:

--- test/build.t
+++ test/build.t.err
@@ -3,7 +3,6 @@
 Building the package for aurget

   $ aurget -Sb --builddir "$PWD" --noedit --nodeps --noconfirm --nodiscard aurget
-  warning: * (glob)
   :: Searching AUR...

   Targets (1): aurget-4.6.0-1 
!

But this also happend also without my changes. So I will see, if there is something wrong with my local setup, or there are some changes required.

aragon999 commented 2 years ago

I see, mh, I will look into this, just as a note, the different outputs currently are:

Current Test:

$ cat test/fixtures/curl/647f76afd5f5e99c8676cfb6e7f7ad85/stdout
{"version":5,"type":"multiinfo","resultcount":1,"results":[{"ID":421625,"Name":"3dsconv","PackageBaseID":113012,"PackageBase":"3dsconv","Version":"4.1-0","Description":"Tool to convert Nintendo 3DS CTR Cart Image files (CCI, \".3ds\") to the CTR Importable Archive format (CIA).","URL":"https:\/\/github.com\/ihaveamac\/3dsconv","NumVotes":2,"Popularity":0,"OutOfDate":1527927325,"Maintainer":"yubimusubi","FirstSubmitted":1467258609,"LastModified":1498084672,"URLPath":"\/cgit\/aur.git\/snapshot\/3dsconv.tar.gz","Depends":["python2"],"MakeDepends":["tar"],"OptDepends":["python2-crypto"],"Provides":["3dsconv"],"License":["MIT"],"Keywords":[]}]}

Current API Response:

$ curl --silent --fail 'https://aur.archlinux.org/rpc/?v=5&type=info&arg[]=3dsconv'
{"resultcount":1,"results":[{"Depends":["python-pyaes","python-setuptools"],"Description":"Convert Nintendo 3DS files to the CIA format","FirstSubmitted":1467258609,"ID":671981,"Keywords":[],"LastModified":1575139708,"License":["MIT"],"Maintainer":"adsun","Name":"3dsconv","NumVotes":2,"OutOfDate":null,"PackageBase":"3dsconv","PackageBaseID":113012,"Popularity":0,"URL":"https://github.com/ihaveamac/3dsconv","URLPath":"/cgit/aur.git/snapshot/3dsconv.tar.gz","Version":"4.2-1"}],"type":"multiinfo","version":5}

Diff:

$ diff -u <(curl --silent --fail 'https://aur.archlinux.org/rpc/?v=5&type=info&arg[]=3dsconv'|jq) <(cat test/fixtures/curl/647f76afd5f5e99c8676cfb6e7f7ad85/stdout|jq)
--- /proc/self/fd/14    2022-02-08 11:22:47.830794265 +0100
+++ /proc/self/fd/15    2022-02-08 11:22:47.830794265 +0100
@@ -1,31 +1,39 @@
 {
+  "version": 5,
+  "type": "multiinfo",
   "resultcount": 1,
   "results": [
     {
+      "ID": 421625,
+      "Name": "3dsconv",
+      "PackageBaseID": 113012,
+      "PackageBase": "3dsconv",
+      "Version": "4.1-0",
+      "Description": "Tool to convert Nintendo 3DS CTR Cart Image files (CCI, \".3ds\") to the CTR Importable Archive format (CIA).",
+      "URL": "https://github.com/ihaveamac/3dsconv",
+      "NumVotes": 2,
+      "Popularity": 0,
+      "OutOfDate": 1527927325,
+      "Maintainer": "yubimusubi",
+      "FirstSubmitted": 1467258609,
+      "LastModified": 1498084672,
+      "URLPath": "/cgit/aur.git/snapshot/3dsconv.tar.gz",
       "Depends": [
-        "python-pyaes",
-        "python-setuptools"
+        "python2"
+      ],
+      "MakeDepends": [
+        "tar"
+      ],
+      "OptDepends": [
+        "python2-crypto"
+      ],
+      "Provides": [
+        "3dsconv"
       ],
-      "Description": "Convert Nintendo 3DS files to the CIA format",
-      "FirstSubmitted": 1467258609,
-      "ID": 671981,
-      "Keywords": [],
-      "LastModified": 1575139708,
       "License": [
         "MIT"
       ],
-      "Maintainer": "adsun",
-      "Name": "3dsconv",
-      "NumVotes": 2,
-      "OutOfDate": null,
-      "PackageBase": "3dsconv",
-      "PackageBaseID": 113012,
-      "Popularity": 0,
-      "URL": "https://github.com/ihaveamac/3dsconv",
-      "URLPath": "/cgit/aur.git/snapshot/3dsconv.tar.gz",
-      "Version": "4.2-1"
+      "Keywords": []
     }
-  ],
-  "type": "multiinfo",
-  "version": 5
+  ]
 }
aragon999 commented 2 years ago

I investigated it, and played a bit around with the awk script. Apparently the order of the values changes, such that the matches are not in correct order anymore and multiple entries are generated, see: grafik

If I "correct" the order everything works again. Unfortunately I do not know enough about awk, that I would know how to fix this problem, i.e. reorder the entries.

As a side note, for me jq is now standard software, which I require on most machines I work with, like curl or so. But of course, this is probably only me.

pbrisbin commented 2 years ago

I do not know enough about awk, that I would know how to fix this problem, i.e. reorder the entries

You could also change the order the lines that read this expect. For example,

-   while IFS='|' read -r name pkgbase version description url votes outofdate; do
+   while IFS='|' read -r description votes version pkgbase url name outofdate; do

(Or whatever order it is now)

This happens in a few places, so we'd need to just change them all.

A more robust (but still dependency-free) way would be to have the awk script print a Bash associative-array that is evald.

- /Name":/        { printf "%s",    unstring($0) }
+ /Name":/        { printf "rpc[name]=%s\n",    unstring($0) }

Then you have to do something like,

declare -A rpc
eval $(... | parse_rpc)

But you will be able to use ${rpc[name]} now, and it should work regardless of order.

This feels pretty janky, and I wonder if it'd be a potential security issue, so I don't love it. It would also be a broader structural change to move away from the while IFS=| read approach.


As a side note, for me jq is now standard software, which I require on most machines I work with, like curl or so. But of course, this is probably only me.

This is true of myself and, I assume, any working professional; I use jq probably 50 times a day. I'm just not sure that changes the Arch anti-dependency stigma. But I can be willing to let that go.

If simply re-ordering the read fixes things, let's just do that. If there's more to it, I'll Approve the jq addition. Sound good?

aragon999 commented 2 years ago

No I do not thing one should introduce eval, since in my opinion, this requires much more attention when working with these code parts, in terms of security.

As far as I can tell reordering fixes the problems, see here: https://github.com/aragon999/aurget/commit/2a21e3b74ecd6e4dc3eff2518858c8b9b60af19e I will look into fixing the tests and update the PR accordingly.

But I think the more general issue is, that JSON has no defined order, so the change of the response was no BC, which is correct so far. Maybe when such a thing happens the next time, it would be reasonable to move to jq. Maybe one could even work with jq as an optional dependency :) I do not know how much aurget is used, so I do not know how many people this will affect.

pbrisbin commented 2 years ago

I will look into fixing the tests and update the PR accordingly.

Thanks! Happy to continue the discussion more (now or later), but that'll be a change I can merge and release immediately to get things working while we do.

But I think the more general issue is, that JSON has no defined order

Yes, we'd effectively be leaving that bug open for now. To be fair, that bug has existed as long as aurget has and this is the first time it's mattered. So, maybe it won't matter for another 12 years again.

aragon999 commented 2 years ago

I will look into fixing the tests and update the PR accordingly.

Thanks! Happy to continue the discussion more (now or later), but that'll be a change I can merge and release immediately to get things working while we do.

I currently fail to fix the tests :see_no_evil: Since everything changes, and my system is not in a proper state (e.g. I currently have no pending updates). Do you have any tips, such that the update gets released soon? If you want I can also provide a second PR which you can modify to fix the tests, if you know what to do.

But I think the more general issue is, that JSON has no defined order

Yes, we'd effectively be leaving that bug open for now. To be fair, that bug has existed as long as aurget has and this is the first time it's mattered. So, maybe it won't matter for another 12 years again.

Okay, I can live with that, but I remember that my first contribution was exactly fixing this part of the code (due to some change in gawk).

pbrisbin commented 2 years ago

If you want I can also provide a second PR which you can modify to fix the tests, if you know what to do.

If you don't mind, just push what you have to this branch. I can see the failure on CI an comment as to the cause, or I'll take over the PR and fix it for you.

aragon999 commented 2 years ago

Alright, I just pushed it.

I mean I know what is wrong, the API response for the corresponding curl requests is in the wrong order. But regenerating them is quite painful, and I was not successful. I also thought about modifying the stdout of the existing requests, maybe using jq :) but did not follow that path further, since I was not sure about the hashes of the fixtures.

pbrisbin commented 2 years ago

Huh, GitHub seems to have let me push to this PR. I'll close my other one.

pbrisbin commented 2 years ago

Some additional context,

There were a few bugs resulting in confusing test failures once you stopped using fixtures...

Unfortunately, the result in the tests was a babbling mess of diff and nothing to help point you at those issues.

aragon999 commented 2 years ago

Thanks a lot for this work. Yes I had to fight with some of these issues, as I tried. So I am totally fine that you did it. Thank you.

And sorry for the missed places, I think in the end these were the problems which I did not solve. But yeah the diff of the fixtures was so huge that I did not commit anything of that.

Huh, GitHub seems to have let me push to this PR. I'll close my other one.

Yes I think I can set if the maintainer may modify the PR or not :)

pbrisbin commented 2 years ago

Released as v4.7.6. Thanks!