lean-dojo / ReProver

Retrieval-Augmented Theorem Provers for Lean
https://leandojo.org
MIT License
229 stars 53 forks source link

Cannot locate premise #44

Closed zoryzhang closed 8 months ago

zoryzhang commented 8 months ago

I'm not entirely sure whether this behavior is expected. The error level seems pretty high.

To reproduce:

data = RetrievalDataModule(
    model_name="kaiyuy/leandojo-lean4-retriever-byt5-small",
    data_path= "/Users/zory/Desktop/work/AI2Reason/analogyTP/ReProver/data/leandojo_benchmark_4/novel_premises/",
    corpus_path= "/Users/zory/Desktop/work/AI2Reason/analogyTP/ReProver/data/leandojo_benchmark_4/corpus.jsonl",
    num_negatives= 3,
    num_in_file_negatives= 1,
    batch_size= 8,
    eval_batch_size= 64,
    max_seq_len= 1024,
    num_workers= 4,
)
data.setup("fit")

Error message:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-03-10 15:14:59.411 | INFO     | common:__init__:200 - Building the corpus from [/Users/zory/Desktop/work/AI2Reason/analogyTP/ReProver/data/leandojo_benchmark_4/corpus.jsonl](https://file+.vscode-resource.vscode-cdn.net/Users/zory/Desktop/work/AI2Reason/analogyTP/ReProver/data/leandojo_benchmark_4/corpus.jsonl)
2024-03-10 15:15:20.407 | INFO     | retrieval.datamodule:_load_data:46 - Loading data from [/Users/zory/Desktop/work/AI2Reason/analogyTP/ReProver/data/leandojo_benchmark_4/novel_premises/train.json](https://file+.vscode-resource.vscode-cdn.net/Users/zory/Desktop/work/AI2Reason/analogyTP/ReProver/data/leandojo_benchmark_4/novel_premises/train.json)
  2%|▏         | 1800/98514 [00:00<00:05, 17993.11it/s]2024-03-10 15:15:40.746 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_nonneg', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [671, 24], 'def_end_pos': [671, 34]}
2024-03-10 15:15:40.747 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_nonneg', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [671, 24], 'def_end_pos': [671, 34]}
2024-03-10 15:15:40.797 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': "ProbabilityTheory.condCdf'_def", 'def_path': 'Mathlib/Probability/Kernel/CondCdf.lean', 'def_pos': [1, 1], 'def_end_pos': [1, 1]}
2024-03-10 15:15:40.798 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': "ProbabilityTheory.condCdf'_def", 'def_path': 'Mathlib/Probability/Kernel/CondCdf.lean', 'def_pos': [1, 1], 'def_end_pos': [1, 1]}
  4%|▎         | 3600/98514 [00:00<00:07, 12019.15it/s]2024-03-10 15:15:40.990 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'add_nonneg', 'def_path': 'Mathlib/Algebra/Order/Monoid/Lemmas.lean', 'def_pos': [1092, 24], 'def_end_pos': [1092, 34]}
  5%|▍         | 4915/98514 [00:00<00:09, 9793.59it/s] 2024-03-10 15:15:41.018 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_nonneg', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [671, 24], 'def_end_pos': [671, 34]}
2024-03-10 15:15:41.177 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_nonpos', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [665, 24], 'def_end_pos': [665, 34]}
  6%|▌         | 5973/98514 [00:00<00:11, 8030.84it/s]2024-03-10 15:15:41.261 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'AddMonoidHomClass.isometry_of_norm', 'def_path': 'Mathlib/Analysis/Normed/Group/Basic.lean', 'def_pos': [894, 12], 'def_end_pos': [894, 23]}
2024-03-10 15:15:41.263 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'AddMonoidHomClass.isometry_of_norm', 'def_path': 'Mathlib/Analysis/Normed/Group/Basic.lean', 'def_pos': [894, 12], 'def_end_pos': [894, 23]}
2024-03-10 15:15:41.268 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_pos', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [677, 24], 'def_end_pos': [677, 31]}
2024-03-10 15:15:41.295 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_nonpos', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [665, 24], 'def_end_pos': [665, 34]}
2024-03-10 15:15:41.299 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_nonneg', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [671, 24], 'def_end_pos': [671, 34]}
2024-03-10 15:15:41.300 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_nonneg', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [671, 24], 'def_end_pos': [671, 34]}
2024-03-10 15:15:41.301 | WARNING  | common:get_all_pos_premises:352 - Cannot locate premise: {'full_name': 'neg_nonneg', 'def_path': 'Mathlib/Algebra/Order/Group/Defs.lean', 'def_pos': [671, 24], 'def_end_pos': [671, 34]}
yangky11 commented 8 months ago

How many Cannot locate premise did you get? It's expected up to a certain number. I'll create a thread in GitHub Discussions to explain.

zoryzhang commented 8 months ago

I appreciate your help on explanation. While running on fit seems to go wild, the propotion of this error turns out as small as only <20 tactics for val (<100 premises).

yangky11 commented 8 months ago

Please see https://github.com/lean-dojo/ReProver/discussions/45.