thoth-station / isis-api

API exposing algorithms on top of project2vec
GNU General Public License v3.0
1 stars 8 forks source link
aistacks artificial-intelligence hacktoberfest machine-learning project2vec python thoth

Isis

A service exposing package tags and package categories to Thoth's recommendation engine.

project2vec ###########

Isis API exposes functionality on top of project2vec - description of a package using a vector. The vector consists of features that the given project provides. These features are aggregated based on keywords found in the Python ecosystem and subsequently they are extracted from project descriptions and other free text descriptions of a project (README files on linked GitHub repos).

These vectors form space in which we can search for similar packages (by computing distance between these vectors) as well as performing feature based queries by for example masking the resulting vectors (specifing features we are interested in).

The figure bellow shows a visualization in TensorBoard after dimensionality space reduction using t-SNE. There can be seen formed clusters of similar packages and search of similar packages in the ecosystem.

.. figure:: https://raw.githubusercontent.com/thoth-station/isis-api/master/example/tb.gif :alt: TensorBoard project2vec visualization :align: center

Deployment ##########

The service is built using OpenShift's s2i. On deployment, there is first run an init container that downloads model from Ceph/S3 (created by one of the flows defined by selinon-worker <https://github.com/thoth-station/selinon-worker>_ flows).