zeionara / spike

A tool for generating sparql queries from natural language utterances using large machine learning models
1 stars 1 forks source link

Would you like to add a `requirements.yml` and description of python version? #1

Open chuanwise opened 10 months ago

chuanwise commented 10 months ago

I can not use conda env create -f ./environment.yml to create provided environment. I met:

Retrieving notices: ...working... done
Collecting package metadata (repodata.json): failed

UnavailableInvalidChannel: HTTP 404 NOT FOUND for channel simple <http://pypi.douban.com/simple>

The channel is not accessible or is invalid.

You will need to adjust your conda configuration to proceed.
Use `conda config --show channels` to view your configuration's current state,
and use `conda config --show-sources` to view config file locations.

or:

ResolvePackageNotFound: 
- click==8.0.4=py311h06a4308_0
- pyopenssl==23.2.0=py311h06a4308_0
- tqdm==4.65.0=py311h92b7b1e_0
- zstd==1.5.5=hc292b87_0
- libgfortran-ng==11.2.0=h00389a5_1
- brotlipy==0.7.0=py311h5eee18b_1002
- libuuid==1.41.5=h5eee18b_0
- packaging==23.1=py311h06a4308_0
- transformers==4.32.1=py311h06a4308_0
- pytorch==2.2.0.dev20231001=py3.11_cuda12.1_cudnn8.9.2_0
- mkl_fft==1.3.6=py311ha02d727_1
- dill==0.3.6=py311h06a4308_0
- matplotlib-base==3.7.2=py311ha02d727_0
- re2==2022.04.01=h295c915_0
- bzip2==1.0.8=h7b6447c_0
- certifi==2023.7.22=py311h06a4308_0
- markupsafe==2.1.1=py311h5eee18b_0
- openai==0.27.4=py311h06a4308_0
- mkl-service==2.4.0=py311h5eee18b_1
- cffi==1.15.1=py311h5eee18b_3
- libgomp==11.2.0=h1234567_1
- colorama==0.4.6=py311h06a4308_0
- _openmp_mutex==5.1=1_gnu
- ujson==5.4.0=py311h6a678d5_0
- urllib3==1.26.16=py311h06a4308_0
- x264==1!157.20191217=h7b6447c_0
- boost-cpp==1.82.0=hdb19cb5_1
- et_xmlfile==1.1.0=py311h06a4308_0
- mpc==1.1.0=h10f8cd9_1
- frozenlist==1.3.3=py311h5eee18b_0
- libevent==2.1.12=hdbd6064_1
- orc==1.7.4=hb3bc3d3_1
- openh264==2.1.1=h4ff587b_0
- libwebp==1.2.4=h11a3e52_1
- numexpr==2.8.4=py311h65dcdc2_1
- libboost==1.82.0=ha8e66a6_1
- libgcc-ng==11.2.0=h1234567_1
- types-pytz==2022.4.0.0=py311h06a4308_1
- libtasn1==4.19.0=h5eee18b_0
- numpy-base==1.25.2=py311hf175353_0
- mkl==2023.1.0=h213fc3f_46343
- pylint==2.16.2=py311h06a4308_0
- libgfortran5==11.2.0=h1234567_1
- pip==23.2.1=py311h06a4308_0
- python-xxhash==2.0.2=py311h5eee18b_1
- gflags==2.2.2=he6710b0_0
- lame==3.100=h7b6447c_0
- kiwisolver==1.4.4=py311h6a678d5_0
- snappy==1.1.9=h295c915_0
- ncurses==6.4=h6a678d5_0
- freetype==2.12.1=h4a9f257_0
- pycodestyle==2.10.0=py311h06a4308_0
- krb5==1.20.1=h143b758_1
- jedi==0.18.1=py311h06a4308_1
- lcms2==2.12=h3be6417_0
- torchvision==0.17.0.dev20231001=py311_cu121
- yaml==0.2.5=h7b6447c_0
- lazy-object-proxy==1.6.0=py311h5eee18b_0
- libwebp-base==1.2.4=h5eee18b_1
- brotli==1.0.9=h5eee18b_7
- multiprocess==0.70.14=py311h06a4308_0
- jpeg==9e=h5eee18b_1
- huggingface_hub==0.15.1=py311h06a4308_0
- libunistring==0.9.10=h27cfd23_0
- attrs==22.1.0=py311h06a4308_0
- libidn2==2.3.4=h5eee18b_0
- flake8==6.0.0=py311h06a4308_0
- setuptools==68.0.0=py311h06a4308_0
- tomlkit==0.11.1=py311h06a4308_0
- yarl==1.8.1=py311h5eee18b_0
- pysocks==1.7.1=py311h06a4308_0
- abseil-cpp==20211102.0=hd4dd3e8_0
- libffi==3.4.4=h6a678d5_0
- networkx==3.1=py311h06a4308_0
- libnghttp2==1.52.0=h2d74bed_1
- libthrift==0.15.0=h1795dd8_2
- pluggy==1.0.0=py311h06a4308_1
- pytorch-cuda==12.1=ha16c6d3_5
- c-ares==1.19.1=h5eee18b_0
- pydocstyle==6.3.0=py311h06a4308_0
- wheel==0.38.4=py311h06a4308_0
- typing_extensions==4.7.1=py311h06a4308_0
- idna==3.4=py311h06a4308_0
- pillow==9.4.0=py311h6a678d5_0
- pyarrow==11.0.0=py311hd8e8d9b_1
- giflib==5.2.1=h5eee18b_3
- python-lsp-server==1.7.2=py311h06a4308_0
- ld_impl_linux-64==2.38=h1181459_1
- mkl_random==1.2.2=py311ha02d727_1
- tbb==2021.8.0=hdb19cb5_0
- platformdirs==3.10.0=py311h06a4308_0
- rope==1.7.0=py311h06a4308_0
- jinja2==3.1.2=py311h06a4308_0
- async-timeout==4.0.2=py311h06a4308_0
- libstdcxx-ng==11.2.0=h1234567_1
- openssl==3.0.11=h7f8727e_2
- plotly==5.9.0=py311h06a4308_0
- gnutls==3.6.15=he1e5248_0
- gmp==6.2.1=h295c915_3
- docstring-to-markdown==0.11=py311h06a4308_0
- fsspec==2023.9.2=py311h06a4308_0
- contourpy==1.0.5=py311hdb19cb5_0
- multidict==6.0.2=py311h5eee18b_0
- pyparsing==3.0.9=py311h06a4308_0
- ca-certificates==2023.08.22=h06a4308_0
- astroid==2.14.2=py311h06a4308_0
- libbrotlidec==1.0.9=h5eee18b_7
- pytz==2022.7=py311h06a4308_0
- requests==2.31.0=py311h06a4308_0
- sqlite==3.41.2=h5eee18b_0
- gmpy2==2.1.2=py311hc9b5ff0_0
- libjpeg-turbo==2.0.0=h9bf148f_0
- icu==73.1=h6a678d5_0
- numpy==1.25.2=py311h08b1b3b_0
- aws-c-event-stream==0.1.6=h6a678d5_6
- nettle==3.7.3=hbbd107a_1
- tokenizers==0.13.2=py311h22610ee_1
- zipp==3.11.0=py311h06a4308_0
- brotli-bin==1.0.9=h5eee18b_7
- datasets==2.12.0=py311h06a4308_0
- xxhash==0.8.0=h7f8727e_3
- libtiff==4.5.1=h6a678d5_0
- xz==5.4.2=h5eee18b_0
- readline==8.2=h5eee18b_0
- lz4-c==1.9.4=h6a678d5_0
- lerc==3.0=h295c915_0
- libpng==1.6.39=h5eee18b_0
- libssh2==1.10.0=hdbd6064_2
- glog==0.5.0=h2531618_0
- whatthepatch==1.0.2=py311h06a4308_0
- aws-checksums==0.1.11=h5eee18b_2
- filelock==3.9.0=py311h06a4308_0
- sympy==1.11.1=py311h06a4308_0
- python==3.11.4=h955ad1f_0
- libprotobuf==3.20.3=he621ea3_0
- torchtriton==2.1.0+6e4932cda8=py311
- arrow-cpp==11.0.0=h374c478_2
- pyflakes==3.0.1=py311h06a4308_0
- libopus==1.3.1=h7b6447c_0
- zlib==1.2.13=h5eee18b_0
- aws-sdk-cpp==1.8.185=h721c034_1
- intel-openmp==2023.1.0=hdb19cb5_46305
- libev==4.33=h7f8727e_1
- libedit==3.1.20221030=h5eee18b_0
- cryptography==41.0.2=py311h22a60cf_0
- utf8proc==2.6.1=h27cfd23_0
- libvpx==1.7.0=h439df22_0
- wrapt==1.14.1=py311h5eee18b_0
- openpyxl==3.0.10=py311h5eee18b_0
- pytoolconfig==1.2.5=py311h06a4308_1
- grpc-cpp==1.48.2=he1ff14a_1
- bottleneck==1.3.5=py311hbed6279_0
- mpfr==4.0.2=hb69a4c5_1
- libbrotlienc==1.0.9=h5eee18b_7
- pandas-stubs==1.5.3.230203=py311h06a4308_0
- aiohttp==3.8.5=py311h5eee18b_0
- joblib==1.2.0=py311h06a4308_0
- scikit-learn==1.2.2=py311h6a678d5_1
- libcurl==8.2.1=h251f7ec_0
- tenacity==8.2.2=py311h06a4308_0
- libbrotlicommon==1.0.9=h5eee18b_7
- scipy==1.11.1=py311h08b1b3b_0
- libcufile==1.7.2.10=0
- llvm-openmp==14.0.6=h9e868ea_0
- libdeflate==1.17=h5eee18b_0
- pandas==2.0.3=py311ha02d727_0
- importlib-metadata==6.0.0=py311h06a4308_0
- torchaudio==2.2.0.dev20231001=py311_cu121
- tk==8.6.12=h1ccaba5_0
- safetensors==0.3.2=py311hb02cf49_0
- aws-c-common==0.6.8=h5eee18b_1
- ffmpeg==4.2.2=h20bf706_0

I retried many many many many time but it still fail : (

zeionara commented 10 months ago

Hi, that is strange, I've just tried to do the same and got the following error:

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - pytorch==2.2.0.dev20231001=py3.11_cuda12.1_cudnn8.9.2_0
  - torchvision==0.17.0.dev20231001=py311_cu121
  - torchaudio==2.2.0.dev20231001=py311_cu121

Which should be solved by installing pytorch manually:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

At least click==8.0.4=py311h06a4308_0 resolves successfully:

> conda install click=8.0.4=py311h06a4308_0

Collecting package metadata (current_repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 22.11.1
  latest version: 23.9.0

Please update conda by running

    $ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.9.0

## Package Plan ##

  environment location: /home/zeio/anaconda3/envs/foo

  added / updated specs:
    - click==8.0.4=py311h06a4308_0

The following NEW packages will be INSTALLED:

  click              pkgs/main/linux-64::click-8.0.4-py311h06a4308_0

Python version is specified in the environment config

What version of conda do you use? Do you run the package on windows or linux?

chuanwise commented 10 months ago

I used anaconda 23.9.0 on Windows, and my chanels related settings in .condarc is:

channels:
  - defaults
  - conda-forge
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  deepmodeling: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/

I tried to change my .condarc to:

channels:
  - defaults
  - conda-forge

But it doesn't work : (

Did you use some special mirrors or channels in defaults?

zeionara commented 10 months ago

No, there are no special settings in my .condarc related to channels. If you are using windows then maybe that's why default environment.yml doesn't work for you, because I've generated it from linux.

In this case I would try to install the dependencies one-by-one manually starting from pytorch and openai, there aren't actually a lot of them - the list in environment.yml mostly consists of 'secondary' dependencies, which will be installed automatically

chuanwise commented 10 months ago

I created environment in my wsl successfully. Then I run command python -m spike ask 'What kind of bear is best and it printed:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/chuanwise/develop/projects/spike/spike/__main__.py", line 376, in <module>
    main()
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/__main__.py", line 73, in ask
    answer = responder.ask(question, fresh = fresh, dry_run = dry_run)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/Responder.py", line 92, in ask
    context = OrkgContext(fresh = fresh, graph = self.graph)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/OrkgContext.py", line 80, in __init__
    with open(cache_path, 'wb') as file:
         ^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'assets/cache/orkg-context.pkl'

I downloaded orkg.pkl and placed it to asserts and I tried to copy one and rename it to cache/orkg-context.pkl, then it printed:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/chuanwise/develop/projects/spike/spike/__main__.py", line 376, in <module>
    main()
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/__main__.py", line 73, in ask
    answer = responder.ask(question, fresh = fresh, dry_run = dry_run)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/Responder.py", line 101, in ask
    examples, _ = context.cut(question)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/OrkgContext.py", line 86, in cut
    return examples, '\n'.join([
                               ^
  File "/home/chuanwise/develop/projects/spike/spike/OrkgContext.py", line 91, in <listcomp>
    entry.mark == PrefixContextEntry.mark or
    ^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'mark'

I can run run.sh. But it does not always run successfully. Sometimes It will raise exception:

000. What are the titles and IDs of research papers that include a benchmark for the HoC dataset?
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/chuanwise/develop/projects/spike/spike/__main__.py", line 378, in <module>
    main()
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/anaconda3/lib/python3.11/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/__main__.py", line 100, in ask
    query, answer = responder.ask(question)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/Responder.py", line 92, in ask
    context = OrkgContext(fresh = fresh, graph = self.graph)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/OrkgContext.py", line 73, in __init__
    context.append(ClassContextEntry.from_binding(binding, prefixes = PREFIXES))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chuanwise/develop/projects/spike/spike/ClassContextEntry.py", line 18, in from_binding
    prefix, name = cut_prefix(binding['class']['value'], prefixes)
                              ~~~~~~~^^^^^^^^^
TypeError: string indices must be integers, not 'str'
chuanwise commented 10 months ago

I changed codes near OrkgContext.py (81) from:

with open(cache_path, 'wb') as file:
    pkl.dump(context, file)

to:

if os.path.isfile(cache_path):
    with open(cache_path, 'wb') as file:
        pkl.dump(context, file)

Then the program runs. Should this code look like this?

chuanwise commented 10 months ago

I found that this program treats problems as if they were problems in ORKG. I wanted to port it to other knowledge graphs, so I tried to add the option - g graph-serialized-by- rdflib, but found that the content under the ORKG namespace still appeared in the generated query. How to transplant it to other knowledge graphs provided by rdflib.Graph?