hltcoe / patapsco

Cross language information retrieval pipeline
Other
18 stars 6 forks source link

Competing Java VM instance in jnius between patapsco/pyserini and pyTerrier #23

Closed eugene-yang closed 2 years ago

eugene-yang commented 2 years ago

Both pyserini and pyTerrier use jnius to integrate with their Java code. But jnius does not allow spawning multiple VMs. So after running patapsco once, when we try to initialize pyTerrier.

import pyterrier as pt
if not pt.started():
    pt.init(tqdm='notebook')

We will get

ValueError: VM is already running, can't set classpath/options; VM started at  File "/Users/eyang/miniconda3/envs/patapsco/lib/python3.8/runpy.py", line 194, in _run_module_as_main

And vice versa. I'm not that familiar with jnius but are we able to not start a new VM and able to run both?

cash commented 2 years ago

There is probably a way to hack this. It will requiring adding the path to the jar file to jnius. I'll take a look.

cash commented 2 years ago

Can you try the branch 23-set-classpath-early?

You'll need to import pyterrier and call init() before calling run() on Runner. Once Patapsco runs once, the VM is loaded and the pyTerrier jar cannot be added.

eugene-yang commented 2 years ago

Now it starts to complain the VM is running by the time I import patapsco. Looks like it is caused by the patch of PSQ

import pyterrier as pt
if not pt.started():
    pt.init(tqdm='notebook')

import copy
import random
from pathlib import Path
import pandas as pd

import patapsco
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/f7/741f9g1x03dfyjzqfpgr_w800000gp/T/ipykernel_89121/3537804280.py in <module>
      5 import pandas as pd
      6 
----> 7 import patapsco

~/Documents/Repositories/patapsco/patapsco/__init__.py in <module>
     10 
     11 # TODO remove
---> 12 from .psq_setup import configure_classpath_psq

~/Documents/Repositories/patapsco/patapsco/psq_setup.py in <module>
     33 pyserini.setup.configure_classpath = skip_setting_classpath
     34 
---> 35 configure_classpath_psq()

~/Documents/Repositories/patapsco/patapsco/psq_setup.py in configure_classpath_psq()
     17 
     18     latest = max(paths, key=os.path.getctime)
---> 19     jnius_config.add_classpath(latest)
     20     psq_path = (Path(__file__).parent / 'resources' / 'jars').glob('psq*.jar')
     21     if not psq_path:

~/miniconda3/envs/patapsco/lib/python3.8/site-packages/jnius_config.py in add_classpath(*path)
     55     Replaces any existing classpath, overriding the CLASSPATH environment variable.
     56     """
---> 57     check_vm_running()
     58     global classpath
     59     if classpath is None:

~/miniconda3/envs/patapsco/lib/python3.8/site-packages/jnius_config.py in check_vm_running()
     18     """Raises a ValueError if the VM is already running."""
     19     if vm_running:
---> 20         raise ValueError("VM is already running, can't set classpath/options; VM started at" + vm_started_at)
     21 
     22 

ValueError: VM is already running, can't set classpath/options; VM started at  File "/Users/eyang/miniconda3/envs/patapsco/lib/python3.8/runpy.py", line 194, in _run_module_as_main
cash commented 2 years ago

Import patapsco first and then call init() on pyterrier.

eugene-yang commented 2 years ago

Cool it works :)