Handling of in-file negatives in ReProver

Hi, thanks for open releasing the source code of ReProver and providing a very helpful guide on training the model. I notice that the in-file negatives are critical for the performance of ReProver, as explained in the paper, and I'm trying to understand how this is implemented.

As far as I understand, given an example ex, this for-loop iterates over all premises that are present in the file where the context belongs, i.e., all premises in ex["context"].path.

https://github.com/lean-dojo/ReProver/blob/0dbb82e3507cb8303dbb550bdb96bfbbb37e1ced/retrieval/datamodule.py#L105-L112

Here is what I found a bit confusing: when ex["pos_premise"].path != ex["context"].path (i.e., when the premise is imported from some other file instead of defined/proved in the same file), all of these premises we are iterating on will be added to premises_outside_file, instead of premises_in_file. This seems a bit counter-intuitive, because these premises actually come from the same file as the context, could you please explain a bit more on this?

lean-dojo / ReProver

Handling of in-file negatives in ReProver #46