usc-isi-i2 / kgtk

Knowledge Graph Toolkit
https://kgtk.readthedocs.io/en/latest/
MIT License
348 stars 57 forks source link

kgtk graph-embeddings #500

Open szeke opened 2 years ago

szeke commented 2 years ago

Running

%%time
!$kgtk graph-embeddings -i "$TEMP"/item.edges.tsv.gz \
--output_format kgtk \
-o "$OUT"/graph-embeddings.tsv.gz

I get the following error:

INFO:torchbiggraph:Loading entity counts...
INFO:torchbiggraph:Creating workers...
INFO:torchbiggraph:Initializing global model...
Traceback (most recent call last):
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 10, in <module>
Traceback (most recent call last):
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 10, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 10, in <module>
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 10, in <module>
    from importlib.metadata import distribution
ModuleNotFoundError: No module named 'importlib.metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from importlib.metadata import distribution
ModuleNotFoundError: No module named 'importlib.metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from importlib.metadata import distribution
ModuleNotFoundError: No module named 'importlib.metadata'
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main

During handling of the above exception, another exception occurred:

  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    from importlib.metadata import distribution
ModuleNotFoundError: No module named 'importlib.metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
    exitcode = _main(fd)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
    exitcode = _main(fd)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
    exitcode = _main(fd)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 114, in _main
    prepare(preparation_data)
    prepare(preparation_data)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
    prepare(preparation_data)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
    prepare(preparation_data)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/multiprocessing/spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 263, in run_path
    run_name="__mp_main__")
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 263, in run_path
    run_name="__mp_main__")
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 263, in run_path
    run_name="__mp_main__")
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
    pkg_name=pkg_name, script_name=fname)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 96, in _run_module_code
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 96, in _run_module_code
    pkg_name=pkg_name, script_name=fname)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 96, in _run_module_code
    pkg_name=pkg_name, script_name=fname)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 85, in _run_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 85, in _run_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 85, in _run_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 13, in <module>
    exec(code, run_globals)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 13, in <module>
    from importlib_metadata import distribution
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 15, in <module>
    exec(code, run_globals)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 13, in <module>
    from importlib_metadata import distribution
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 15, in <module>
    exec(code, run_globals)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 13, in <module>
    from importlib_metadata import distribution
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 15, in <module>
    from importlib_metadata import distribution
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 15, in <module>
    from ._compat import (
    from ._compat import (
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/importlib_metadata/_compat.py", line 8, in <module>
    from ._compat import (
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/importlib_metadata/_compat.py", line 8, in <module>
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/importlib_metadata/_compat.py", line 8, in <module>
    from ._compat import (
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/importlib_metadata/_compat.py", line 8, in <module>
    from typing import Protocol
    from typing import Protocol
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/typing.py", line 1359, in <module>
    from typing import Protocol
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/typing.py", line 1359, in <module>
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/typing.py", line 1359, in <module>
    from typing import Protocol
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/typing.py", line 1359, in <module>
Traceback (most recent call last):
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/kgtk", line 10, in <module>
    from importlib.metadata import distribution
ModuleNotFoundError: No module named 'importlib.metadata'
szeke commented 2 years ago

I tried

pip install importlib-metadata

but the error persists

saggu commented 2 years ago

@szeke please post your python version and output of pip freeze

CraigMiloRogers commented 2 years ago

Which version of Python are you using? importlib.metadata requires Python 3.8 or later.

szeke commented 2 years ago

I am using Python 3.7.9

(kgtk-env) MacBook-Pro:temp.schwarzenegger pedroszekely$ pip freeze
altair==4.1.0
ansiwrap==0.8.4
anyio==2.0.2
appdirs==1.4.4
appnope==0.1.2
argon2-cffi==20.1.0
async-generator==1.10
attrs==20.3.0
Babel==2.9.0
backcall==0.2.0
beautifulsoup4==4.9.3
black==20.8b1
bleach==3.2.2
blis==0.4.1
cached-property==1.5.2
catalogue==1.0.0
certifi==2020.12.5
cffi @ file:///Users/runner/miniforge3/conda-bld/cffi_1606601143848/work
chardet==4.0.0
click==7.1.2
cloudpickle==1.6.0
cssselect==1.1.0
cycler==0.10.0
cymem==2.0.5
Cython==0.29.21
cytoolz==0.11.0
dask==2021.1.1
dateparser==1.0.0
decorator==4.4.2
defusedxml==0.6.0
demjson==2.2.4
dill==0.3.3
distributed==2021.1.1
distro==1.5.0
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz
entrypoints==0.3
et-xmlfile==1.0.1
etk==2.2.3a0
filelock==3.0.12
ftfy==5.8
gensim==3.8.3
h5py==3.1.0
HeapDict==1.0.1
html5lib==1.1
idna==2.10
importlib-metadata==3.4.0
IProgress==0.4
ipykernel==5.4.3
ipython==7.19.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
iso-639==0.4.5
isodate==0.6.0
jdcal==1.4.1
jedi==0.18.0
Jinja2==2.11.2
joblib==1.0.0
json5==0.9.5
jsonpath-ng==1.5.2
jsonschema==3.2.0
jupyter-client==6.1.11
jupyter-core==4.7.0
jupyter-server==1.2.2
jupyterlab==3.0.5
jupyterlab-pygments==0.1.2
jupyterlab-server==2.1.2
jupyterlab-widgets==1.0.0
# Editable install with no version control (kgtk==0.7.1)
-e /Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.7/site-packages/kgtk-0.7.1-py3.7.egg
kiwisolver @ file:///Users/runner/miniforge3/conda-bld/kiwisolver_1610099791311/work
langdetect==1.0.8
langid==1.1.6
lml==0.1.0
loguru==0.5.3
lxml==4.6.2
lz4==3.1.3
MarkupSafe==1.1.1
matplotlib @ file:///Users/runner/miniforge3/conda-bld/matplotlib-suite_1610582823860/work
mgzip==0.2.1
mistune==0.8.4
msgpack==1.0.2
msgpack-numpy==0.4.3.2
msgpack-python==0.5.6
multiprocess==0.70.11.1
murmurhash==1.0.5
mypy==0.800
mypy-extensions==0.4.3
nbclassic==0.2.6
nbclient==0.5.1
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.4.3
nltk==3.5
notebook==6.2.0
numpy @ file:///Users/runner/miniforge3/conda-bld/numpy_1610324566095/work
odictliteral==1.0.0
olefile @ file:///home/conda/feedstock_root/build_artifacts/olefile_1602866521163/work
openpyxl==3.0.6
owlrl==5.2.1
packaging==20.8
pandas==1.2.1
pandocfilters==1.4.3
papermill==2.3.1
Parsley==1.3
parso==0.8.1
pathlib==1.0.1
pathspec==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow @ file:///Users/runner/miniforge3/conda-bld/pillow_1610407527390/work
plac==1.1.3
plotly==4.14.3
ply==3.11
preshed==3.0.5
prometheus-client==0.9.0
prompt-toolkit==3.0.13
psutil==5.8.0
ptyprocess==0.7.0
pycairo==1.20.0
pycountry==20.7.3
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1593275161868/work
pyexcel==0.6.6
pyexcel-io==0.6.4
pyexcel-xls==0.6.2
pyexcel-xlsx==0.6.0
Pygments==2.7.4
PyGObject==3.38.0
pygtrie==2.4.2
pyparsing==2.4.7
pyrallel.lib==0.0.9
pyrsistent==0.17.3
pyshacl==0.9.10
python-dateutil==2.8.1
pytz==2020.5
PyYAML==5.4.1
pyzmq==21.0.1
rdflib @ git+https://github.com/RDFLib/rdflib.git@2077524d43a103c3b9bf9fdd009a4942c7fff032
rdflib-jsonld==0.5.0
redis==3.5.3
regex==2020.11.13
requests==2.25.1
retrying==1.3.3
rfc3986==1.4.0
rltk==2.0.0a15
sacremoses==0.0.43
scikit-learn==0.24.1
scipy @ file:///Users/runner/miniforge3/conda-bld/scipy_1609457877771/work
seaborn==0.11.1
Send2Trash==1.5.0
sentence-transformers==0.4.1.2
sentencepiece==0.1.95
sh==1.14.1
shortuuid==1.0.1
simplejson==3.17.2
six @ file:///home/conda/feedstock_root/build_artifacts/six_1590081179328/work
sklearn==0.0
smart-open==4.1.2
sniffio==1.2.0
sortedcontainers==2.3.0
soupsieve==2.1
spacy==2.2.4
SPARQLWrapper==1.8.5
srsly==1.0.5
tabula-py==2.2.0
tblib==1.7.0
tenacity==6.3.1
termcolor==1.1.0
terminado==0.9.2
testpath==0.4.4
texttable==1.6.3
textwrap3==0.9.2
thinc==7.4.0
threadpoolctl==2.1.0
tokenizers==0.9.4
toml==0.10.2
toolz==0.11.1
torch==1.7.1
torchbiggraph==1.0.0
tornado @ file:///Users/runner/miniforge3/conda-bld/tornado_1610094698292/work
tqdm==4.56.0
traitlets==5.0.5
transformers==4.2.2
typed-ast==1.4.2
typing==3.7.4.3
typing-extensions==3.7.4.3
tzlocal==2.1
ujson==4.0.2
urllib3==1.26.2
wasabi==0.8.0
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1
wrapt==1.12.1
xlrd==1.2.0
xlwt==1.3.0
zict==2.0.0
zipp==3.4.0
zstandard @ file:///Users/runner/miniforge3/conda-bld/zstandard_1611351936558/work
CraigMiloRogers commented 2 years ago

I think you need to upgrade to Python 3.8. That is now the recommended version in the kgtk repository's README.md:

conda create -n kgtk-env python=3.8
szeke commented 2 years ago

Now I am in KGTK installation nightmare. I saw the following error, and now my notebooks don't work.

During installation I get the following error:

    ERROR: Command errored out with exit status 1:
     command: /Users/pedroszekely/opt/anaconda3/envs/kgtk-env/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/1_/_5pln4v50xxf_3f_8vy081x40000gq/T/pip-install-wo_aeu4l/demjson_7b681d32e677474da0b64f6b52f95264/setup.py'"'"'; __file__='"'"'/private/var/folders/1_/_5pln4v50xxf_3f_8vy081x40000gq/T/pip-install-wo_aeu4l/demjson_7b681d32e677474da0b64f6b52f95264/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/1_/_5pln4v50xxf_3f_8vy081x40000gq/T/pip-pip-egg-info-bm_rouy_
         cwd: /private/var/folders/1_/_5pln4v50xxf_3f_8vy081x40000gq/T/pip-install-wo_aeu4l/demjson_7b681d32e677474da0b64f6b52f95264/
    Complete output (1 lines):
    error in demjson setup command: use_2to3 is invalid.
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/96/67/6db789e2533158963d4af689f961b644ddd9200615b8ce92d6cad695c65a/demjson-2.2.4.tar.gz#sha256=31de2038a0fdd9c4c11f8bf3b13fe77bc2a128307f965c8d5fb4dc6d6f6beb79 (from https://pypi.org/simple/demjson/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
CraigMiloRogers commented 2 years ago

Are you installing in a fresh conda environment?

conda create -n kgtk-env3.8 python=3.8

Or, are you upgrading your existing environment?

CraigMiloRogers commented 2 years ago

Mitchell deHaven said:

I ran into this problem the other day while downloading KGTK. I think you will need to downgrade your setuptools.

Downgrading to setuptools==57.0.0 fixed my issues when setting up a new conda environment a few days ago.

CraigMiloRogers commented 2 years ago

@szeke said:

The suggestion from Mitchell worked. THanks @Mitchell DeHaven

CraigMiloRogers commented 2 years ago

I am closing this issue. Issue #502 will carry the more general problems of validating and improving the KGTK installation procedures.

szeke commented 2 years ago

After fixing the installation problems, the embeddings still error out:

INFO:torchbiggraph:Loading entity counts...
INFO:torchbiggraph:Creating workers...
INFO:torchbiggraph:Initializing global model...
INFO:torchbiggraph:Starting epoch 1 / 100, edge path 1 / 1, edge chunk 1 / 1
INFO:torchbiggraph:Edge path: /tmp/output/edges_partitioned
INFO:torchbiggraph:still in queue: 0
INFO:torchbiggraph:Swapping partitioned embeddings None ( 0 , 0 )
INFO:torchbiggraph:( 0 , 0 ): Loading entities
Traceback (most recent call last):
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/cli/graph_embeddings.py", line 443, in run
    train(config, subprocess_init=subprocess_init)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/torchbiggraph/train.py", line 938, in train
    for _ in train_and_report_stats(config, model, trainer, evaluator, rank, subprocess_init):
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/torchbiggraph/train.py", line 782, in train_and_report_stats
    all_stats = get_async_result(future_all_stats, pool)
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/torchbiggraph/util.py", line 243, in get_async_result
    raise RuntimeError(
RuntimeError: A subprocess exited unexpectedly with status -11

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/exceptions.py", line 46, in __call__
    return_code = func(*args, **kwargs) or 0
  File "/Users/pedroszekely/opt/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/cli/graph_embeddings.py", line 483, in run
    raise KGTKException(str(e))
kgtk.exceptions.KGTKException: A subprocess exited unexpectedly with status -11
A subprocess exited unexpectedly with status -11
CPU times: user 546 ms, sys: 174 ms, total: 720 ms
Wall time: 48.1 s
CraigMiloRogers commented 2 years ago

@szeke Can you share your input file with me?

CraigMiloRogers commented 2 years ago

I ran:

            -i /data1/rogers/kgtk/gd/kgtk/cache/datasets/wikidata-20210215/data/claims.properties.tsv.gz \
            --output_format kgtk -o graph-embeddings.tsv.gz

and got:

Traceback (most recent call last):
  File "/home/rogers/kgtk/github/kgtk4/kgtk/cli/graph_embeddings.py", line 464, in run
    generate_kgtk_output(entities_output,output_kgtk_file,verbose,very_verbose)
  File "/home/rogers/kgtk/github/kgtk4/kgtk/cli/graph_embeddings.py", line 301, in generate_kgtk_output
    kw.file_out.seek(0)         # set the cursor to the top of the file
  File "/usr/lib64/python3.8/gzip.py", line 376, in seek
    raise OSError('Negative seek in write mode')
OSError: Negative seek in write mode

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rogers/kgtk/github/kgtk4/kgtk/exceptions.py", line 46, in __call__
    return_code = func(*args, **kwargs) or 0
  File "/home/rogers/kgtk/github/kgtk4/kgtk/cli/graph_embeddings.py", line 483, in run
    raise KGTKException(str(e))
kgtk.exceptions.KGTKException: Negative seek in write mode
Negative seek in write mode

This is an error, but I'm guessing it's not the same error.

CraigMiloRogers commented 2 years ago

The error I saw was documented in issue #274.

CraigMiloRogers commented 2 years ago

@szeke asked me to run graph embeddings on claims.wikibase-item.tsv.gz from the arnold dataset. I ran:

kgtk --debug graph-embeddings -i claims.wikibase-item.tsv.gz --output_format kgtk -o graph-embeddings.tsv.gz

There were no errors, and the output file looked normal.