yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
825 stars 138 forks source link

Local file paths being interpreted as URLs? #108

Closed mr-martian closed 1 year ago

mr-martian commented 1 year ago

I have the following command and my intent is for it to train a model from scratch and save it to hbo.model:

python -m supar.cmds.biaffine_dep train \
    -b -d 0 -c baseline.ini -f tag \
    --path "$lang.model" \
    --train "$data_dir/$lang.train.conllu" \
    --dev "$data_dir/$lang.dev.conllu" \
    --test "$data_dir/$lang.test.conllu" \
    --embed ''

However, when I run it on a fresh install from pip, I get the following traceback:

Traceback (most recent call last):     
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/cmds/biaffine_dep.py", line 47, in <module>
    main()
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/cmds/biaffine_dep.py", line 43, in main
    parse(parser)
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/cmds/cmd.py", line 29, in parse
    parser.train(**args)
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/parsers/dep.py", line 62, in train
    return super().train(**Config().update(locals()))
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/parsers/parser.py", line 41, in train
    train = Dataset(self.transform, args.train, **args).build(batch_size, buckets, True, dist.is_initialized())
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/utils/data.py", line 81, in build
    self.buckets = dict(zip(*kmeans([len(s.transformed[fields[0].name]) for s in self], n_buckets)))
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/utils/fn.py", line 69, in kmeans
    dists, y = torch.abs_(x.unsqueeze(-1) - c).min(-1)
IndexError: min(): Expected reduction dim 1 to have non-zero size.
Downloading: hbo.model to /N/u/dangswan/Carbonate/.cache/supar/hbo.model
Traceback (most recent call last):
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/cmds/biaffine_dep.py", line 47, in <module>
    main()
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/cmds/biaffine_dep.py", line 43, in main
    parse(parser)
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/cmds/cmd.py", line 34, in parse
    parser = Parser.load(**args)
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/parsers/dep.py", line 152, in load
    return super().load(path, reload, src, **kwargs)
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/parsers/parser.py", line 194, in load
    state = torch.load(path if os.path.exists(path) else download(supar.MODEL[src].get(path, path), reload=reload))
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/supar/utils/fn.py", line 167, in download
    torch.hub.download_url_to_file(url, path, progress=True)
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/site-packages/torch/hub.py", line 592, in download_url_to_file
    req = Request(url, headers={"User-Agent": "torch.hub"})
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/urllib/request.py", line 322, in __init__
    self.full_url = url
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/urllib/request.py", line 348, in full_url
    self._parse()
  File "/N/u/dangswan/Carbonate/.conda/envs/error-analysis/lib/python3.10/urllib/request.py", line 377, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'hbo.model'

Am I doing something wrong with the command?

yzhangcs commented 1 year ago

@mr-martian Hi, did you modify the code after installation? This shouldn't happen if you try to train a model with -b specified.

mr-martian commented 1 year ago

I did

module load anaconda
conda create -n error-analysis
source activate error-analysis
conda install pip
pip install torch transformers supar udapi

and then ran the posted command. I have a modified copy, but it's in a separate virtual environment.

yzhangcs commented 1 year ago

@mr-martian Can you install the pkg from source code and check it again?

pip install -U git+https://github.com/yzhangcs/parser

I'm still not sure what exactly happened. I'm wondering If you can walk through the entire prediction process to make predictions with the pretrained model -p biaffine-dep-en?

mr-martian commented 1 year ago

So it turns out that the attempt to download is actually from this command:

python -m supar.cmds.biaffine_dep predict \
    --tree -d 0 -c baseline.ini \
    --path "$lang.model" \
    --data "$data_dir/$lang.test.conllu" \
    --pred "$data_dir/$lang.pred.conllu"

The first traceback (ending in IndexError: min(): Expected reduction dim 1 to have non-zero size.) is the training command crashing on ... something.

This explains the downloading, since when the prediction command runs, the model file doesn't exist.

yzhangcs commented 1 year ago

@mr-martian Yeah, if the local file does not exist, the parser would regard it as an url and seek to download it from the remote.

mr-martian commented 1 year ago

Do you know what would be causing that first error though?

yzhangcs commented 1 year ago

@mr-martian Sorry, I don't know yet. The above exception happened during prediction is expected because the model indeed does not exist, but is weird during training as the parser would seek to create new files with -b. If you wish, you can share a colab project or your running files with me so that I can reproduce the errors.

mr-martian commented 1 year ago

As it turns out, there was something wrong with my corpus-splitting script such that it sometimes generated empty dev and test files. After more carefully verifying that the split files were what they should be, training ran just fine.