khuangaf / CONCRETE

Official implementation of "CONCRETE: Improving Cross-lingual Fact Checking with Cross-lingual Retrieval" (COLING'22)
Apache License 2.0
14 stars 0 forks source link

ImportError: cannot import name 'read_xict_samples_from_json_files' from 'dpr.utils.data_utils' #1

Closed rahulmool closed 1 year ago

rahulmool commented 1 year ago

when i run run_xict.sh it shows following error. It seems to me like you have not included dpr in this repository.

Traceback (most recent call last): File "run_xict.py", line 32, in from dpr.utils.data_utils import ShardedDataIterator, read_xict_samples_from_json_files, Tensorizer ImportError: cannot import name 'read_xict_samples_from_json_files' from 'dpr.utils.data_utils' (/scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/lib/python3.7/site-packages/dpr/utils/data_uti$ /scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

FutureWarning, ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 40046) of binary: /scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/bin/python Traceback (most recent call last): File "/home/apps/DL-CondaPy3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/apps/DL-CondaPy3.7/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in main() File "/scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/lib/python3.7/site-packages/torch/distributed/run.py", line 713, in run )(*cmd_args) File "/scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/scratch/22cs60r72/InformationRetrival/copy/CORA/mDPR/mool/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

khuangaf commented 1 year ago

Please download the relevant scripts from the CORA repo. Thank you.

rahulmool commented 1 year ago

I did that but the problem is in dpr.utils.data_utils . There is no function read_xict_samples_from_json_files

khuangaf commented 1 year ago

You are right. Here is the function:

def read_xict_samples_from_json_files(paths: List[str], upsample_rates: List = None) -> List:
    results = []
    if upsample_rates is None:
        upsample_rates = [1] * len(paths)

    assert len(upsample_rates) == len(paths), 'up-sample rates parameter doesn\'t match input files amount'

    for i, path in enumerate(paths):
        with open(path, 'r', encoding="utf-8") as f:
            logger.info('Reading file %s' % path)
            data = [json.loads(l) for l in f.readlines() ]
            upsample_factor = int(upsample_rates[i])
            data = data * upsample_factor
            results.extend(data)
            logger.info('Aggregated data size: {}'.format(len(results)))
    return results

I will update the README later.