qchateau / conan-center-bot

A bot to automatically update conan-center-index
GNU General Public License v3.0
11 stars 2 forks source link

Support sourceforge #61

Open ericLemanissier opened 3 years ago

ericLemanissier commented 3 years ago

There are currently on CCI 35 recipes downloaded from sourceforge but I'm not sure there is a way to programatically get all versions available on sourceforge projects, or even the latest

samuel-emrys commented 2 years ago

It looks like sourceforge does provide a number of APIs that could be used to achieve this:

  1. http://p.sf.net/sourceforge/api-docs
  2. https://sourceforge.net/p/forge/documentation/API/

Using armadillo as an example, reference (1) provides the /rest/p/{project} endpoint, which yields data of the following format for a project:

{
  "shortname": "arma",
  "name": "Armadillo",
  "_id": "515bd38b34309d2ec14b9b95",
  "url": "https://sourceforge.net/p/arma/",
  "private": false,
  "short_description": "* Fast C++ library for linear algebra (matrix maths) and scientific computing\r\n* Easy to use functions and syntax, deliberately similar to Matlab / Octave\r\n* Uses template meta-programming techniques to increase efficiency\r\n* Provides user-friendly wrappers for OpenBLAS, Intel MKL, LAPACK, ATLAS, ARPACK and SuperLU libraries\r\n* Useful for machine learning, pattern recognition, signal processing, bioinformatics, statistics, finance, etc.\r\n\r\n* Downloads: http://arma.sourceforge.net/download.html\r\n* Documentation: http://arma.sourceforge.net/docs.html\r\n* Bug reports: http://arma.sourceforge.net/faq.html\r\n* Git repo: https://gitlab.com/conradsnicta/armadillo-code\r\n",
  "creation_date": "2008-02-08",
  "summary": "fast C++ library for linear algebra & scientific computing",
  "external_homepage": "http://arma.sourceforge.net",
  "video_url": null,
  "socialnetworks": [
    {
      "accounturl": "",
      "socialnetwork": "Twitter"
    },
    {
      "accounturl": null,
      "socialnetwork": "Facebook"
    }
  ],
  "status": "active",
  "moved_to_url": "",
  "preferred_support_tool": "_url",
  "preferred_support_url": "http://arma.sourceforge.net/faq.html",
  "developers": [
    {
      "username": "conrad_s",
      "name": "Conrad Sanderson",
      "url": "https://sourceforge.net/u/conrad_s/"
    },
    {
      "username": "eddelbuettel",
      "name": "Dirk Eddelbuettel",
      "url": "https://sourceforge.net/u/eddelbuettel/"
    },
    {
      "username": "dmpouzas",
      "name": "Dimitrios Bouzas",
      "url": "https://sourceforge.net/u/dmpouzas/"
    },
    {
      "username": "rcurtin",
      "name": "Ryan Curtin",
      "url": "https://sourceforge.net/u/rcurtin/"
    }
  ],
  "tools": [
    {
      "name": "files-sf",
      "mount_point": "files",
      "url": "/p/arma/files/",
      "icons": {
        "24": "images/downloads_24.png",
        "32": "images/downloads_32.png",
        "48": "images/downloads_48.png"
      },
      "installable": false,
      "tool_label": "Files",
      "mount_label": "Files"
    },
    {
      "name": "reviews",
      "mount_point": "reviews",
      "url": "/p/arma/reviews/",
      "icons": {
        "24": "images/sftheme/24x24/blog_24.png",
        "32": "images/sftheme/32x32/blog_32.png",
        "48": "images/sftheme/48x48/blog_48.png"
      },
      "installable": false,
      "tool_label": "Reviews",
      "mount_label": "Reviews"
    },
    {
      "name": "blog",
      "mount_point": "news",
      "url": "/p/arma/news/",
      "icons": {
        "24": "images/blog_24.png",
        "32": "images/blog_32.png",
        "48": "images/blog_48.png"
      },
      "installable": true,
      "tool_label": "Blog",
      "mount_label": "News"
    },
    {
      "name": "summary",
      "mount_point": "summary",
      "url": "/p/arma/summary/",
      "icons": {
        "24": "images/sftheme/24x24/blog_24.png",
        "32": "images/sftheme/32x32/blog_32.png",
        "48": "images/sftheme/48x48/blog_48.png"
      },
      "installable": false,
      "tool_label": "Summary",
      "mount_label": "Summary",
      "sourceforge_group_id": 217303
    },
    {
      "name": "support",
      "mount_point": "support",
      "url": "/p/arma/support/",
      "icons": {
        "24": "images/sftheme/24x24/blog_24.png",
        "32": "images/sftheme/32x32/blog_32.png",
        "48": "images/sftheme/48x48/blog_48.png"
      },
      "installable": false,
      "tool_label": "Support",
      "mount_label": "Support"
    },
    {
      "name": "link",
      "mount_point": "documentation",
      "url": "/p/arma/documentation/",
      "icons": {
        "24": "images/ext_24.png",
        "32": "images/ext_32.png",
        "48": "images/ext_48.png"
      },
      "installable": true,
      "tool_label": "External Link",
      "mount_label": "Documentation"
    },
    {
      "name": "activity",
      "mount_point": "activity",
      "url": "/p/arma/activity/",
      "icons": {
        "24": "images/admin_24.png",
        "32": "images/admin_32.png",
        "48": "images/admin_48.png"
      },
      "installable": false,
      "tool_label": "Tool",
      "mount_label": "Activity"
    },
    {
      "name": "mailman",
      "mount_point": "mailman",
      "url": "/p/arma/mailman/",
      "icons": {
        "24": "images/forums_24.png",
        "32": "images/forums_32.png",
        "48": "images/forums_48.png"
      },
      "installable": false,
      "tool_label": "Mailing Lists",
      "mount_label": "Mailing Lists"
    },
    {
      "name": "link",
      "mount_point": "gitrepo",
      "url": "/p/arma/gitrepo/",
      "icons": {
        "24": "images/ext_24.png",
        "32": "images/ext_32.png",
        "48": "images/ext_48.png"
      },
      "installable": true,
      "tool_label": "External Link",
      "mount_label": "Git Repository"
    }
  ],
  "labels": [
    ""
  ],
  "categories": {
    "audience": [
      {
        "id": 363,
        "shortname": "informationtechnology",
        "fullname": "Information Technology",
        "fullpath": "Intended Audience :: by Industry or Sector :: Information Technology"
      },
      {
        "id": 367,
        "shortname": "scienceresearch",
        "fullname": "Science/Research",
        "fullpath": "Intended Audience :: by Industry or Sector :: Science/Research"
      },
      {
        "id": 360,
        "shortname": "education",
        "fullname": "Education",
        "fullpath": "Intended Audience :: by Industry or Sector :: Education"
      },
      {
        "id": 536,
        "shortname": "enduser_advanced",
        "fullname": "Advanced End Users",
        "fullpath": "Intended Audience :: by End-User Class :: Advanced End Users"
      },
      {
        "id": 3,
        "shortname": "developers",
        "fullname": "Developers",
        "fullpath": "Intended Audience :: by End-User Class :: Developers"
      },
      {
        "id": 729,
        "shortname": "audienceengineering",
        "fullname": "Engineering",
        "fullpath": "Intended Audience :: by Industry or Sector :: Engineering"
      }
    ],
    "developmentstatus": [
      {
        "id": 11,
        "shortname": "production",
        "fullname": "5 - Production/Stable",
        "fullpath": "Development Status :: 5 - Production/Stable"
      }
    ],
    "environment": [],
    "language": [
      {
        "id": 626,
        "shortname": "matlab",
        "fullname": "MATLAB",
        "fullpath": "Programming Language :: MATLAB"
      },
      {
        "id": 165,
        "shortname": "cpp",
        "fullname": "C++",
        "fullpath": "Programming Language :: C++"
      }
    ],
    "license": [
      {
        "id": 401,
        "shortname": "apache2",
        "fullname": "Apache License V2.0",
        "fullpath": "License :: OSI-Approved Open Source :: Apache License V2.0"
      }
    ],
    "translation": [],
    "os": [
      {
        "id": 427,
        "shortname": "cygwin",
        "fullname": "Cygwin (MS Windows)",
        "fullpath": "Operating System :: Emulation and API Compatibility :: Cygwin (MS Windows)"
      },
      {
        "id": 445,
        "shortname": "mingw_msys",
        "fullname": "MinGW/MSYS (MS Windows)",
        "fullpath": "Operating System :: Emulation and API Compatibility :: MinGW/MSYS (MS Windows)"
      },
      {
        "id": 201,
        "shortname": "linux",
        "fullname": "Linux",
        "fullpath": "Operating System :: Modern (Vendor-Supported) Desktop Operating Systems :: Linux"
      },
      {
        "id": 309,
        "shortname": "macosx",
        "fullname": "OS X",
        "fullpath": "Operating System :: Modern (Vendor-Supported) Desktop Operating Systems :: OS X"
      },
      {
        "id": 436,
        "shortname": "os_portable",
        "fullname": "OS Portable (Source code to work with many OS platforms)",
        "fullpath": "Operating System :: Grouping and Descriptive Categories :: OS Portable (Source code to work with many OS platforms)"
      },
      {
        "id": 200,
        "shortname": "posix",
        "fullname": "All POSIX (Linux/BSD/UNIX-like OSes)",
        "fullpath": "Operating System :: Grouping and Descriptive Categories :: All POSIX (Linux/BSD/UNIX-like OSes)"
      },
      {
        "id": 219,
        "shortname": "winnt",
        "fullname": "32-bit MS Windows (NT/2000/XP)",
        "fullpath": "Operating System :: Grouping and Descriptive Categories :: 32-bit MS Windows (NT/2000/XP)"
      }
    ],
    "database": [],
    "topic": [
      {
        "id": 620,
        "shortname": "algorithms",
        "fullname": "Algorithms",
        "fullpath": "Topic :: Software Development :: Algorithms"
      },
      {
        "id": 98,
        "shortname": "mathematics",
        "fullname": "Mathematics",
        "fullpath": "Topic :: Scientific/Engineering :: Mathematics"
      },
      {
        "id": 802,
        "shortname": "machinelearning",
        "fullname": "Machine Learning",
        "fullpath": "Topic :: Scientific/Engineering :: Artificial Intelligence :: Machine Learning"
      }
    ]
  },
  "icon_url": "https://sourceforge.net/p/arma/icon",
  "screenshots": []
}

You can see from this that it does provide a link to the projects download page. In the case of armadillo, this is https://sourceforge.net/rest/p/arma/files. Unfortunately, submitting a GET request to this endpoint returns the same content as https://sourceforge.net/projects/arma/files/ - it returns HTML rather than json, so a scraper would be necessary to extract the file names. Having said this, it does look like this HTML does contain a potentially useful json structure:

net.sf.files = {
    "foreground": {
        "name": "foreground",
        "path": "",
        "download_url": "https://sourceforge.net/projects/arma/files/foreground/download",
        "url": "/projects/arma/files/foreground/",
        "full_path": "foreground",
        "type": "d",
        "link": "",
        "downloads": 7641,
        "sha1": "",
        "md5": "",
        "default": "",
        "download_label": "",
        "exclude_reports": false,
        "downloadable": false,
        "legacy_release_notes": null,
        "staged": false,
        "stage": 0,
        "staging_days": 3,
        "files_url": "/projects/arma/files/",
        "explicitly_staged": false,
        "authorized": null
    },
...
    "armadillo-10.7.3.tar.xz": {
        "name": "armadillo-10.7.3.tar.xz",
        "path": "",
        "download_url": "https://sourceforge.net/projects/arma/files/armadillo-10.7.3.tar.xz/download",
        "url": "/projects/arma/files/armadillo-10.7.3.tar.xz/",
        "full_path": "armadillo-10.7.3.tar.xz",
        "type": "f",
        "link": "",
        "downloads": 404,
        "sha1": "5c26e5700cff8061c140469c5cca689e83b2f767",
        "md5": "49b36e13b775c02c9a2ea1d19e074a90",
        "default": "windows,mac,linux,android,bsd,solaris,others",
        "download_label": "",
        "exclude_reports": false,
        "downloadable": true,
        "legacy_release_notes": null,
        "staged": false,
        "stage": 0,
        "staging_days": 3,
        "files_url": "/projects/arma/files/",
        "explicitly_staged": false,
        "authorized": null
    },
    "armadillo-10.7.2.tar.xz": {
        "name": "armadillo-10.7.2.tar.xz",
        "path": "",
        "download_url": "https://sourceforge.net/projects/arma/files/armadillo-10.7.2.tar.xz/download",
        "url": "/projects/arma/files/armadillo-10.7.2.tar.xz/",
        "full_path": "armadillo-10.7.2.tar.xz",
        "type": "f",
        "link": "",
        "downloads": 350,
        "sha1": "7735f55f1ff8d35dc3972701c9481a1261a2b5b9",
        "md5": "34928a6e259a17a0b3fa3b65f266434e",
        "default": "",
        "download_label": "",
        "exclude_reports": false,
        "downloadable": true,
        "legacy_release_notes": null,
        "staged": false,
        "stage": 0,
        "staging_days": 3,
        "files_url": "/projects/arma/files/",
        "explicitly_staged": false,
        "authorized": null
    },
    "armadillo-10.7.1.tar.xz": {
        "name": "armadillo-10.7.1.tar.xz",
        "path": "",
        "download_url": "https://sourceforge.net/projects/arma/files/armadillo-10.7.1.tar.xz/download",
        "url": "/projects/arma/files/armadillo-10.7.1.tar.xz/",
        "full_path": "armadillo-10.7.1.tar.xz",
        "type": "f",
        "link": "",
        "downloads": 2388,
        "sha1": "87bbcc8607eb25387e79d4f28280ce1074732878",
        "md5": "afce33a52763edcc8c27349acbe8161a",
        "default": "",
        "download_label": "",
        "exclude_reports": false,
        "downloadable": true,
        "legacy_release_notes": null,
        "staged": false,
        "stage": 0,
        "staging_days": 3,
        "files_url": "/projects/arma/files/",
        "explicitly_staged": false,
        "authorized": null
    },
    "armadillo-10.7.0.tar.xz": {
        "name": "armadillo-10.7.0.tar.xz",
        "path": "",
        "download_url": "https://sourceforge.net/projects/arma/files/armadillo-10.7.0.tar.xz/download",
        "url": "/projects/arma/files/armadillo-10.7.0.tar.xz/",
        "full_path": "armadillo-10.7.0.tar.xz",
        "type": "f",
        "link": "",
        "downloads": 1895,
        "sha1": "f7048e520c5dd43d1bdec842126049f4c6c32938",
        "md5": "796bd58a0ca4e2de184b616fc248b9d4",
        "default": "",
        "download_label": "",
        "exclude_reports": false,
        "downloadable": true,
        "legacy_release_notes": null,
        "staged": false,
        "stage": 0,
        "staging_days": 3,
        "files_url": "/projects/arma/files/",
        "explicitly_staged": false,
        "authorized": null
    },
...
    "README.md": {
        "name": "README.md",
        "path": "",
        "download_url": "https://sourceforge.net/projects/arma/files/README.md/download",
        "url": "/projects/arma/files/README.md/",
        "full_path": "README.md",
        "type": "f",
        "link": "",
        "downloads": 186,
        "sha1": "81afa147127bb36324ed70825b9c2d38c4a11204",
        "md5": "d8692a4922eb3871b17bcf6966aeb691",
        "default": "",
        "download_label": "",
        "exclude_reports": false,
        "downloadable": true,
        "legacy_release_notes": null,
        "staged": false,
        "stage": 0,
        "staging_days": 3,
        "files_url": "/projects/arma/files/",
        "explicitly_staged": false,
        "authorized": null
    }
}

I'm not sure if this is common to all sourceforge projects, but this net.sf.files variable seems to store all versions provided by a project, or at least it does in the case of armadillo. From here you could try to identify a semver version and extract it for comparison to identify the latest version. Beyond that, it looks like they might be ordered chronologically, so you could look at where the existing recipe sits in the json and if it's not first this could be an identifier that it's not the latest version.

Might be fragile, but could be a good place to start.

ericLemanissier commented 2 years ago

Thanks for checking this. For this bot, I don't think it had to be robust. Even if it works for only 10% of the sourceforge projects, it's a net improvement.

qchateau commented 2 years ago

I've had a quick look at this and sadly it does not seem that all (many?) sourceforge projects follow such pattern

samuel-emrys commented 2 years ago

For this to be useful, we only need this to work for the majority of the following libraries:

conan-center-index $ grep -rnw ".*sourceforge.*" . | grep "conandata" | cut -d'/' -f3 | uniq                                                                                                                                                
podofo
libuuid
freeglut
aruco
boost
opencore-amr
half
libnova
zint
soxr
irrxml
zlib
argtable2
libdisasm
pcre2
libsmacker
armadillo
libid3tag
wtl
pexports
pcre
mingw-w64
libdc1394
libmikmod
rapidxml
libnoise
gsoap
pthreads4w
qwt
doxygen
ftjam
tinyxml
aaf
gfortran
scons
cunit
librhash
freetype
msys2
tk
mpg123
libsquish
zbar
libmp3lame
giflib
tcl
geographiclib
libmad
matio
nas

I just did a spot check on boost, and in constrast to the above I had to query /p/boost/files/boost rather than /p/boost/files. There was still a net.sf.files object which had version information. Another way to approach this might be to try to strip version information out of the url entry in conandata.yml - it seems like if you can find the level that contains version information you could try to construct something. I'll have a look at a few more of these projects to see if there are any/many common themes when i get a chance