msprev / fzf-bibtex

a BibTeX source for fzf
BSD 3-Clause "New" or "Revised" License
129 stars 15 forks source link

Add crossref support #5

Closed cao closed 5 years ago

cao commented 5 years ago
msprev commented 5 years ago

This is interesting. I like the idea of supporting crossrefs.

I'm not keen, though, to have 1 single cache for all BibTeX files. I'd like to keep 1 cache file per bibtex file, to minimise number of times a cache needs to be refreshed.

cao commented 5 years ago

You would need to parse all BibTeX files again though, because crossrefs might have changed, so entries from BibTeX file A might be impacted by a change to file B.

msprev commented 5 years ago

That is right, but I still would like to minimise the number of cache refreshes. Can you confirm behaviour change:

Old behaviour:

Running on [A.bib, B.bib] produces 2 caches, X, Y. Running on [A.bib] produces 1 cache, X. Running on [B.bib] produces 1 cache, Y.


New behaviour:

Running on [A.bib, B.bib] produces 1 cache, X. Running on [A.bib] produces 1 cache, Y. Running on [B.bib] produces 1 cache, Z.

Where X != Y != Z. Right?

My concern is that if that if someone is running on [huge A.bib, small B.bib], they will have a performance penalty on changes to small B.bib, as the joint cache (X) will need to be refreshed. Under current behaviour, they don't (only small Z is refreshed).

cao commented 5 years ago

This is correct, though it is a fundamental problem when using crossrefs.

I'm not sure what the average file sizes for BibTeX are that the tool is trying to tackle, but a local test runs in around 0.03s on a 2.9GHz i7 MBP (on battery) for a set of 10 BibTeX files that are 500KB total and crossref heavy.

Do you have a test case where this would be problematic?

Overall, the only solutions I can think of would are tracking crossref dependencies and refreshing only the necessary caches, or just making crossref resolution optional and rely on the old cache behavior if crossrefs are disabled. The former is a fairly complex endeavor requiring additional parsing of BibTeX, new data structures to track dependencies, and it does not seem reasonable. The latter could make sense if it's a performance problem in practice, though I'm not sure the the increased complexity of two caching behaviors makes sense for potentially saving 100ms or so only when the cache needs to be refreshed.

msprev commented 5 years ago

I'm going to merge your PR. There are upsides and downsides to this which I wanted noted, but it looks likely to be overall a benefit by allowing crossrefs across .bib files.