usc-isi-i2 / kgtk

Knowledge Graph Toolkit
https://kgtk.readthedocs.io/en/latest/
MIT License
355 stars 57 forks source link

`AttributeError: Can't pickle local object 'run.<locals>.MyCollector'` and `UserWarning: resource_tracker` #529

Open valecarriero opened 3 years ago

valecarriero commented 3 years ago

Describe the bug error and warning with kgtk import-wikidata

To Reproduce

(kgtk-env) MacBook-Pro-di-Valentina:~ vale$ kgtk  --debug --timing import-wikidata \
>         -i /Volumes/LaCie/wikidata_dump_json_29092021/latest-all.json.gz \
>         --node nodefile.tsv \
>         --edge edgefile.tsv \
>         --qual qualfile.tsv \
>         --use-mgzip-for-input True \
>         --use-mgzip-for-output True \
>         --use-shm True \
>         --procs 6 \
>         --mapper-batch-size 5 \
>         --max-size-per-mapper-queue 3 \
>         --single-mapper-queue True \
>         --collect-results True \
>         --collect-seperately True\
>         --collector-batch-size 10 \
>         --collector-queue-per-proc-size 3 \
>         --progress-interval 500000 --fail-if-missing False
kgtk import-wikidata version: 2021-02-24T21:11:49.602037+00:00#sgB3FM8zpy/0bbx1RwyRawYnB1spAUBS+FVVQBL8DtJVxXE8mYCTTLr2lHJqbKVe5fBPp+k5iQjTDmJ6GRVf8Q==
Starting main process (pid 27731).
Processing.
Processing wikidata file /Volumes/LaCie/wikidata_dump_json_29092021/latest-all.json.gz
Decompressing (mgzip)
Creating the collector queue.
The collector node queue has been created (maxsize=18).
Creating the node_collector.
Creating the node collector process.
Starting the node collector process.
Traceback (most recent call last):
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/cli/import_wikidata.py", line 2623, in run
    node_collector_p.start()
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'run.<locals>.MyCollector'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/exceptions.py", line 46, in __call__
    return_code = func(*args, **kwargs) or 0
  File "/Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/site-packages/kgtk/cli/import_wikidata.py", line 3028, in run
    raise KGTKException(str(e))
kgtk.exceptions.KGTKException: Can't pickle local object 'run.<locals>.MyCollector'
Can't pickle local object 'run.<locals>.MyCollector'
Timing: elapsed=0:00:00.224073 CPU=0:00:00.220067 ( 98.2%): import-wikidata -i /Volumes/LaCie/wikidata_dump_json_29092021/latest-all.json.gz --node nodefile.tsv --edge edgefile.tsv --qual qualfile.tsv --use-mgzip-for-input True --use-mgzip-for-output True --use-shm True --procs 6 --mapper-batch-size 5 --max-size-per-mapper-queue 3 --single-mapper-queue True --collect-results True --collect-seperately True --collector-batch-size 10 --collector-queue-per-proc-size 3 --progress-interval 500000 --fail-if-missing False
(kgtk-env) MacBook-Pro-di-Valentina:~ vale$ /Users/vale/anaconda3/envs/kgtk-env/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 19 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Desktop (please complete the following information): OS: macOS Mojave 10.14.6

Additional context

CraigMiloRogers commented 3 years ago

@valecarriero kgtk import-wikidata is very complex. I haven't see this error before. Can you share /Volumes/LaCie/wikidata_dump_json_29092021/latest-all.json.gz with me on a Google Drive?

szeke commented 3 years ago

@valecarriero Import wikidata is complex. We have several releases that we can share with you so you don’t have to do it yourself. We can import the newest version and share with you.

valecarriero commented 3 years ago

@valecarriero kgtk import-wikidata is very complex. I haven't see this error before. Can you share /Volumes/LaCie/wikidata_dump_json_29092021/latest-all.json.gz with me on a Google Drive?

I'm afraid I don't have enough space to share it! However, I downloaded it on Sept 29th, so I think it should be wikidata-20210927-all.json.gz here: https://dumps.wikimedia.org/wikidatawiki/entities/20210927/

valecarriero commented 3 years ago

@valecarriero Import wikidata is complex. We have several releases that we can share with you so you don’t have to do it yourself. We can import the newest version and share with you.

It would be very useful to start working on wikidata with kgtk!

dgarijo commented 3 years ago

@valecarriero, do you mind giving a try a previous version of Wikidata that we have tested? last one I tried successfully is 20210104, which corresponds to this json file: https://drive.google.com/file/d/1c_yqDmM5qsKF64Ix9MSDKuAwnbRAcnjD/view?usp=sharing

If you are eager to test Wikidata out with KGTK, this is the file I produced after importing the previous file: https://drive.google.com/file/d/18VGq56BTOHU7ui_WkfcfzL-0hNAIZB0T/view?usp=sharing

We'll test it out with the newer file in the meantime.

phucty commented 3 years ago

I also got the same problem by running the kgtk import-wikidata script on macOS Monterey 12.0.1.

But, I could run the script on the Ubuntu system 20.04.3.

valecarriero commented 2 years ago

Hi, I'm writing here again because I need to work on the most recent version of Wikidata, so I wanted to know if this issue has been solved or not! If not (yet), would you be so kind to share with me the KGTK files for the latest version of Wikidata, as you have done with the 20210104 version? Thank you so much.

saggu commented 2 years ago

Hi @valecarriero I created this notebook and ran it on my mac laptop , https://github.com/usc-isi-i2/kgtk/blob/master/use-cases/import-wikidata.ipynb

Please give this a try. Meanwhile, a new version of Wikidata, Oct 27, 2021, is here: https://drive.google.com/drive/folders/1wsUsgPWOgOmHAqmS-eg45q9Im_-Ll5CX?usp=sharing