haiku / haikudepotserver

Haiku Depot Server
Other
15 stars 10 forks source link

Package number inconsistencies #250

Open diversys opened 9 months ago

diversys commented 9 months ago

Currently HaikuPorts x86_64 repo in HDS shows 8069 packages (including _devel, _source and _debug packages). However, HD shows 7996 packages. pkgman also shows the same number.

pkgman search -aD | grep HaikuPorts | wc -l
7996
nielx commented 9 months ago

The web version also indexes the BeSly and FatElk repositories. I think you can see the details here: https://depot.haiku-os.org/#!/repositories?bcguid=bc198-QBID

Does that explain the difference?

diversys commented 9 months ago

Those repos are disabled by default. If I enable them then HDS shows 8136 packages.

andponlin commented 9 months ago

You can get a list of the packages exported from HDS with the following command;

curl -L "https://depot.haiku-os.org/__pkg/all-haikuports_x86_64-en.json.gz" | gzip -d | jq -r '(.items|sort_by(.name)|.[]|[.name] | join(","))'

You can get a list from Haiku with;

pkgman search -aD | grep HaikuPorts | sed -E 's/^HaikuPorts[ ]+([^ ]+)[ ]+.*$/\1/'

In HDS we see 7998 packages and for Haiku 7996. This gives two extra packages from HDS;

It looks like the cause of this is that two package versions in the database are set as is_latest. I can see...

WITH x AS (
  SELECT COUNT(pv.id) AS version_count, p.name
  FROM haikudepot.pkg_version pv JOIN haikudepot.pkg p ON p.id = pv.pkg_id
  WHERE pv.is_latest = true AND pv.architecture_id=2864 GROUP BY p.name
) SELECT * FROM x WHERE version_count > 1;

...there are quite a few cases of this and many date to 2022-04-29 so maybe there was an issue then.

This may be the case of the anomaly -- can't be totally sure. I will try to look into it further soon.

andponlin commented 9 months ago

Looking into this a bit further this evening, I see that the data in HDS is most likely correct.

It seems as if the problem might be coming from the fact that some packages have been issued under both a regular computer architecture such as x86_64 as well as the pseudo-architecture any. This means that there is a latest existing for both architectures and so both versions appear legitimately in queries which looks like a double-up.

Here is a raw database query to check on this;

SELECT
  pv.id AS version_id, pv.create_timestamp AS version_timestamp, p.name AS pkg_name, a.code as arch, pv.active, pv.is_latest, rs.code AS rs_code
FROM haikudepot.pkg_version pv
JOIN haikudepot.architecture a ON a.id = architecture_id
JOIN haikudepot.pkg p ON p.id = pkg_id
JOIN haikudepot.repository_source rs ON rs.id = pv.repository_source_id
WHERE p.name IN ('monsterz', 'pycharm_community_bin') ORDER BY p.name, pv.create_timestamp DESC;

My older obfuscated snapshot shows;

 version_id |    version_timestamp    |       pkg_name        |   arch   | active | is_latest |       rs_code       
------------+-------------------------+-----------------------+----------+--------+-----------+---------------------
     382810 | 2023-08-22 21:02:28.473 | monsterz              | any      | t      | t         | haikuports_x86_gcc2
     378124 | 2023-08-22 21:01:26.211 | monsterz              | any      | t      | t         | haikuports_x86_64
     321530 | 2019-02-02 19:01:20.346 | monsterz              | x86_gcc2 | t      | t         | haikuports_x86_gcc2
     389313 | 2023-11-05 20:18:26.991 | pycharm_community_bin | x86_64   | t      | t         | haikuports_x86_64
     368493 | 2022-04-28 20:20:13.191 | pycharm_community_bin | any      | f      | t         | haikuports_x86_gcc2
     358653 | 2021-05-14 20:17:28.204 | pycharm_community_bin | any      | t      | t         | haikuports_x86_64
     335472 | 2020-05-24 20:17:45.359 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
     320892 | 2018-12-31 03:01:22.567 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
     320891 | 2018-12-30 20:01:20.769 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2

(@nielx ; it would be great if you would be able to run this same query and paste the results here to confirm the current state in the production database.)

It is somewhat unclear what is "right" and "wrong" here because this is showing a historical trace of the packages' versions. What I think needs to happen here with monsterz and pycharm_community_bin is...

nielx commented 8 months ago

Output from the production database:

 version_id |    version_timestamp    |       pkg_name        |   arch   | active | is_latest |       rs_code
------------+-------------------------+-----------------------+----------+--------+-----------+---------------------
     389820 | 2023-08-18 09:38:38.676 | monsterz              | any      | t      | t         | haikuports_x86_gcc2
     389817 | 2023-08-18 09:35:55.716 | monsterz              | any      | t      | t         | haikuports_x86_64
     384498 | 2023-04-25 05:15:02.158 | monsterz              | x86_64   | t      | t         | haikuports_x86_64
     384347 | 2023-04-22 08:46:50.118 | monsterz              | x86_64   | t      | f         | haikuports_x86_64
     321530 | 2019-02-02 19:01:20.346 | monsterz              | x86_gcc2 | f      | t         | haikuports_x86_gcc2
     393678 | 2023-10-31 11:29:39.911 | pycharm_community_bin | x86_64   | t      | t         | haikuports_x86_64
     368493 | 2022-04-28 20:20:13.191 | pycharm_community_bin | any      | f      | t         | haikuports_x86_gcc2
     358653 | 2021-05-14 20:17:28.204 | pycharm_community_bin | any      | t      | t         | haikuports_x86_64
     335472 | 2020-05-24 20:17:45.359 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
     320892 | 2018-12-31 03:01:22.567 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
     320891 | 2018-12-30 20:01:20.769 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
(11 rows)
andponlin commented 8 months ago

There is an anomaly with the pycharm_community_bin is that it looks like inconsistent packaging between x86_64 and x86gcc2. monsterz looks like it is being packaged under any for some reason. @diversys ; do you have time to communicate with the package owners to clarify what they are doing there?

Maybe for now we could fix the broken packages with;

UPDATE haikudepot.pkg_version SET active = false WHERE id IN (384498, 321530, 358653);

That will make those anomalous versions appear to be deleted.

nielx commented 8 months ago

I have applied the SQL to make them inactive. Let me know if I can be of further assistance.

andponlin commented 8 months ago

Thanks @nielx ; it looks like there is still a discrepancy. I will have to look into at again another time.