predict-idlab / pyRDF2Vec

🐍 Python Implementation and Extension of RDF2Vec
https://pyrdf2vec.readthedocs.io/en/latest/
MIT License
246 stars 51 forks source link

Install errors using windows 10 python2.7 #6

Closed roosyay closed 4 years ago

roosyay commented 4 years ago

Trying to install using python 3.7/3.8 gave me similar but more complex errors. So downgraded to try with python 2.7. Get this single error for scikit_learn (see below).

pip install pyrdf2vec DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support Collecting pyrdf2vec Using cached pyRDF2Vec-0.0.3.tar.gz (425 kB) Collecting gensim==3.5.0 Using cached gensim-3.5.0-cp27-cp27m-win_amd64.whl (23.5 MB) Collecting matplotlib==2.1.1 Using cached matplotlib-2.1.1-cp27-cp27m-win_amd64.whl (8.4 MB) Processing c:\users\roos\appdata\local\pip\cache\wheels\68\f8\29\b53346a112a07d30a5a84d53f19aeadaa1a474897c0423af91\networkx-2.2-py2.py3-none-any.whl Collecting numpy==1.13.3 Using cached numpy-1.13.3-cp27-none-win_amd64.whl (13.0 MB) Collecting pandas==0.23.4 Using cached pandas-0.23.4-cp27-cp27m-win_amd64.whl (7.3 MB) Processing c:\users\roos\appdata\local\pip\cache\wheels\8d\f6\b7\f5e9501d0f006fc9fd497c930206952856b2191ab5c836cb97\rdflib-4.2.2-cp27-none-any.whl ERROR: Could not find a version that satisfies the requirement scikit_learn==0.21.2 (from pyrdf2vec) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0b1, 0.15.0b2, 0.15.0, 0.15.1, 0.15.2, 0.16b1, 0.16.0, 0.16.1, 0.17b1, 0.17, 0.17.1, 0.18rc2, 0.18, 0.18.1, 0.18.2, 0.19b2, 0.19.0, 0.19.1, 0.19.2, 0.20rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.21rc2) ERROR: No matching distribution found for scikit_learn==0.21.2 (from pyrdf2vec)

GillesVandewiele commented 4 years ago

Hi,

We do not support Python 2.7 since it has passed its end of life. Could you try installing it with python > 3.5?

GillesVandewiele commented 4 years ago

Also, please try installing all dependencies: python3 -m pip install -r requirements.txt

roosyay commented 4 years ago

Using python 3.7.7 I get the output below. I tried installing freetype and png. But did not seem to get that to work. Trying to install the requirements.txt gives me the same error for matplotlib.

>pip install pyrdf2vec
Requirement already satisfied: pyrdf2vec in c:\users\roos\environments\embeddings37\lib\site-packages\pyrdf2vec-0.0.3-py3.7.egg (0.0.3)
Requirement already satisfied: gensim==3.5.0 in c:\users\roos\environments\embeddings37\lib\site-packages (from pyrdf2vec) (3.5.0)
Collecting matplotlib==2.1.1
  Using cached matplotlib-2.1.1.tar.gz (36.1 MB)
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\Roos\Environments\Embeddings37\Scripts\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Roos\\AppData\\Local\\Temp\\pip-install-vp1_dumf\\matplotlib\\setup.py'"'"'; __file__='"'"'C:\\Users\\Roos\\AppData\\Local\\Temp\\pip-install-vp1_dumf\\matplotlib\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Roos\AppData\Local\Temp\pip-install-vp1_dumf\matplotlib\pip-egg-info'
         cwd: C:\Users\Roos\AppData\Local\Temp\pip-install-vp1_dumf\matplotlib\
    Complete output (70 lines):
    ============================================================================
    Edit setup.cfg to change the build options

    BUILDING MATPLOTLIB
                matplotlib: yes [2.1.1]
                    python: yes [3.7.7 (tags/v3.7.7:d7c567b08f, Mar 10 2020,
                            10:41:24) [MSC v.1900 64 bit (AMD64)]]
                  platform: yes [win32]

    REQUIRED DEPENDENCIES AND EXTENSIONS
                     numpy: yes [not found. pip may install it below.]
                       six: yes [using six version 1.14.0]
                  dateutil: yes [using dateutil version 2.8.1]
    backports.functools_lru_cache: yes [Not required]
              subprocess32: yes [Not required]
                      pytz: yes [using pytz version 2019.3]
                    cycler: yes [cycler was not found. pip/easy_install may
                            attempt to install it after matplotlib.]
                   tornado: yes [tornado was not found. It is required for the
                            WebAgg backend. pip/easy_install may attempt to
                            install it after matplotlib.]
                 pyparsing: yes [using pyparsing version 2.4.6]
                    libagg: yes [pkg-config information for 'libagg' could not
                            be found. Using local copy.]
                  freetype: no  [The C/C++ header for freetype
                            (freetype2\ft2build.h) could not be found.  You may
                            need to install the development package.]
                       png: no  [The C/C++ header for png (png.h) could not be
                            found.  You may need to install the development
                            package.]
                     qhull: yes [pkg-config information for 'libqhull' could not
                            be found. Using local copy.]

    OPTIONAL SUBPACKAGES
               sample_data: yes [installing]
                  toolkits: yes [installing]
                     tests: no  [skipping due to configuration]
            toolkits_tests: no  [skipping due to configuration]

    OPTIONAL BACKEND EXTENSIONS
                    macosx: no  [Mac OS-X only]
                    qt5agg: no  [PySide2 not found; PyQt5 not found]
                    qt4agg: no  [PySide not found; PyQt4 not found]
                   gtk3agg: no  [Requires pygobject to be installed.]
                 gtk3cairo: no  [Requires cairocffi or pycairo to be installed.]
                    gtkagg: no  [Requires pygtk]
                     tkagg: yes [installing; run-time loading from Python Tcl /
                            Tk]
                     wxagg: no  [requires wxPython]
                       gtk: no  [Requires pygtk]
                       agg: yes [installing]
                     cairo: no  [cairocffi or pycairo not found]
                 windowing: yes [installing]

    OPTIONAL LATEX DEPENDENCIES
                    dvipng: no
               ghostscript: no
                     latex: no
                   pdftops: no

    OPTIONAL PACKAGE DATA
                      dlls: no  [skipping due to configuration]

    ============================================================================
                            * The following required packages can not be built:
                            * freetype, png * Please check http://gnuwin32.sourc
                            * eforge.net/packages/freetype.htm for instructions
                            * to install freetype * Please check http://gnuwin32
                            * .sourceforge.net/packages/libpng.htm for
                            * instructions to install png
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
GillesVandewiele commented 4 years ago

Hmm it's been a while since I used a Windows machine... Perhaps this is related to https://github.com/pydicom/deid/issues/97 ?

Nevertheless, matplotlib is not really a strict dependency, and we only use it visualize a KG (which often is impossible because they are too large).

Perhaps you could try cloning the repo and only install the dependencies required for this minimal example (numpy, sklearn, pandas and rdflib):

import random
import os
import numpy as np

os.environ['PYTHONHASHSEED'] = '42'
random.seed(42)
np.random.seed(42)

import rdflib
import pandas as pd

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score

from converters import rdflib_to_kg
from rdf2vec import RDF2VecTransformer

from walkers import RandomWalker

import warnings
warnings.filterwarnings('ignore')

# Load our train & test instances and labels
test_data = pd.read_csv('../data/MUTAG_test.tsv', sep='\t')
train_data = pd.read_csv('../data/MUTAG_train.tsv', sep='\t')

train_people = [rdflib.URIRef(x) for x in train_data['bond']]
train_labels = train_data['label_mutagenic']

test_people = [rdflib.URIRef(x) for x in test_data['bond']]
test_labels = test_data['label_mutagenic']

all_labels = list(train_labels) + list(test_labels)

# Define the label predicates, all triples with these predicates
# will be excluded from the graph
label_predicates = [
    'http://dl-learner.org/carcinogenesis#isMutagenic'
]

# Convert the rdflib to our KnowledgeGraph object
kg = rdflib_to_kg('../data/mutag.owl', label_predicates=label_predicates)

random_walker = RandomWalker(2, float('inf'))

# Create embeddings with random walks
transformer = RDF2VecTransformer(walkers=[random_walker], sg=1)
walk_embeddings = transformer.fit_transform(kg, train_people + test_people)

# Fit model on the walk embeddings
train_embeddings = walk_embeddings[:len(train_people)]
test_embeddings = walk_embeddings[len(train_people):]

rf =  RandomForestClassifier(random_state=42, n_estimators=100)
rf.fit(train_embeddings, train_labels)

print('Random Forest:')
print(accuracy_score(test_labels, rf.predict(test_embeddings)))
print(confusion_matrix(test_labels, rf.predict(test_embeddings)))

clf =  GridSearchCV(SVC(random_state=42), {'kernel': ['linear', 'poly', 'rbf'], 'C': [10**i for i in range(-3, 4)]})
clf.fit(train_embeddings, train_labels)

print('Support Vector Machine:')
print(accuracy_score(test_labels, clf.predict(test_embeddings)))
print(confusion_matrix(test_labels, clf.predict(test_embeddings)))
roosyay commented 4 years ago

After cloning the repo and installing the packages (numpy, sklearn, pandas and rdflib) I get the following ImportError trying to execute the above piece of code you gave me.

ImportError                               Traceback (most recent call last)
<ipython-input-4-c05727c713ee> in <module>
     15 from sklearn.metrics import confusion_matrix, accuracy_score
     16 
---> 17 from converters import rdflib_to_kg
     18 from rdf2vec import RDF2VecTransformer
     19 

ImportError: cannot import name 'rdflib_to_kg' from 'converters' (c:\users\roos\emb\lib\site-packages\converters\__init__.py)
GillesVandewiele commented 4 years ago

You will have to make sure they are installed in your global path. For now, you could just run the example from the rdf2vec/ directory as the converters.py and rdf2vec.py are in there. I will (hopefully) be able to release a better update today.

GillesVandewiele commented 4 years ago

Could you check if you still have issues with a clean install?

roosyay commented 4 years ago

Works now! Great thank you :-)