Closed valecarriero closed 3 years ago
@valecarriero, we are updating the installation instructions to make sure these errors go away. We'll have more updates next week.
@valecarriero we are able to reproduce the problem, installation is broken for us too. @CraigMiloRogers is working to fix it.
@valecarriero We tracked t least part of the problem down. KGTK required ETK (one of our other projects), and ETK required demjson. demjson is unmaintained, and got left behind in the Python 2 to Python 3 migration.
We released a new version of ETK that doesn't use demjson.
I was able to install a fresh kgtk using:
conda create -n kgtk-env python=3.8
conda activate kgtk-env
conda install -c conda-forge graph-tool
pip --no-cache install -U kgtk
python -m spacy download en_core_web_sm
Please give this a try and report your results. We're here to help you. Thanks.
Another thing I did that might help:
pip install etk==2.2.8
This should come before the pip --no-cache install -U kgtk
. It is possible that by installing ETK first, one should then use pip install -U kgtk
.
Thank you! I tried this
conda create -n kgtk-env python=3.8
conda activate kgtk-env
conda install -c conda-forge graph-tool
pip --no-cache install -U kgtk
python -m spacy download en_core_web_sm
and it worked!
Should this command (from here) work
kgtk --debug --timing --progress import-wikidata \
-i wikidata-all-20200504.json.gz \
--node nodefile.tsv \
--edge edgefile.tsv \
--qual qualfile.tsv \
--use-mgzip-for-input True \
--use-mgzip-for-output True \
--use-shm True \
--procs 6 \
--mapper-batch-size 5 \
--max-size-per-mapper-queue 3 \
--single-mapper-queue True \
--collect-results True \
--collect-seperately True\
--collector-batch-size 10 \
--collector-queue-per-proc-size 3 \
--progress-interval 500000 --fail-if-missing False
or not?
This is what I get
kgtk --debug --timing --progress import-wikidata \
> -i /Volumes/LaCie/wikidata_dump_json_29092021/latest-all.json.gz \
> --node nodefile.tsv \
> --edge edgefile.tsv \
> --qual qualfile.tsv \
> --use-mgzip-for-input True \
> --use-mgzip-for-output True \
> --use-shm True \
> --procs 6 \
> --mapper-batch-size 5 \
> --max-size-per-mapper-queue 3 \
> --single-mapper-queue True \
> --collect-results True \
> --collect-seperately True\
> --collector-batch-size 10 \
> --collector-queue-per-proc-size 3 \
> --progress-interval 500000 --fail-if-missing False
kgtk import-wikidata version: 2021-02-24T21:11:49.602037+00:00#sgB3FM8zpy/0bbx1RwyRawYnB1spAUBS+FVVQBL8DtJVxXE8mYCTTLr2lHJqbKVe5fBPp+k5iQjTDmJ6GRVf8Q==
Starting main process (pid 22079).
Processing.
Processing wikidata file /Volumes/LaCie/wikidata_dump_json_29092021/latest-all.json.gz
Traceback (most recent call last):
File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/cli/import_wikidata.py", line 2580, in run
progress_startup(fd=input_f.fileno()) # Start the custom progress monitor.
File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/cli_entry.py", line 70, in progress_startup
_save_progress_command = sh.pv("-d {}:{}".format(pid, fd),
File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/sh.py", line 3672, in __getattr__
return self.__env[name]
File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/sh.py", line 3457, in __getitem__
raise CommandNotFound(k)
sh.CommandNotFound: pv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/exceptions.py", line 46, in __call__
return_code = func(*args, **kwargs) or 0
File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/cli/import_wikidata.py", line 3028, in run
raise KGTKException(str(e))
kgtk.exceptions.KGTKException: pv
pv
Timing: elapsed=0:00:00.472186 CPU=0:00:00.329380 ( 69.8%): import-wikidata -i /Volumes/LaCie/wikidata_dump_json_29092021/latest-all.json.gz --node nodefile.tsv --edge edgefile.tsv --qual qualfile.tsv --use-mgzip-for-input True --use-mgzip-for-output True --use-shm True --procs 6 --mapper-batch-size 5 --max-size-per-mapper-queue 3 --single-mapper-queue True --collect-results True --collect-seperately True --collector-batch-size 10 --collector-queue-per-proc-size 3 --progress-interval 500000 --fail-if-missing False
Hi @valecarriero, yes, it should work. I used it extensively in the past, successfully. This may be an error introduced by the latest changes. @CraigMiloRogers may know more.
The problem is the --progress
option. It expects the pv
system command to be installed.
Then we should list it as a requirement :S
I plan to have the code check if it exists and ignore --progress
if it does not.
I've committed a change to our dev
branch such that if the pv
command is not available, then --progress
will be silently ignored. I'd prefer to give a warning message when --debug
is also specified, that will be a new issue.
@valecarriero Either drop the --progress
option from the command line, or get the latest code from the KGTK GitHub repository.
I'm closing this issue, assuming that the thumbs-up emoji means the problems have been solved.
Describe the bug Errors during the local installation
To Reproduce Steps to reproduce the behavior:
ERRORS: 1
2
Desktop (please complete the following information):
Additional context
I also tried to downgrade setuptools as suggested in another issue
pip freeze
kgtk -h
python -m spacy download en_core_web_sm