vim-pandoc / vim-pandoc-legacy

[UNSUPPORTED/use vim-pandoc/vim-pandoc] vim bundle for pandoc users
143 stars 23 forks source link

More sophisticated citekey completion #13

Closed dsanson closed 13 years ago

dsanson commented 13 years ago

Vim allows completion functions to return a list of dictionaries, rather than just a list words. The simplest format is

{'word': 'blah', 'menu': 'this is blah'}

So, for our purposes, it might be:

{'word': 'geach1972', 'menu': 'Geach, Logic Matters'}

were the value of 'word' is the citekey that will be inserted, and the value of 'menu' is what shows up in the popup menu.

In order to implement this, I need smarter parsing of the supported bibliography files. This can either be implemented directly via regexs, or we can lean on existing parsers, if they are available. The advantage of using regexes is that it is light-weight---we avoid introducing new dependencies---and they probably work fine.

Another option---inspired by vim.latex-box's use of latex+bibtex to solve a similar problem---would be to collect matching keys and then use pandoc to generate a plaintext bibliography and then parse that. The trouble is that the usual CSL styles don't include the citekey. It might not be too hard to generate a custom CSL file for this purpose. But the process is probably too slow for something like completion.

dsanson commented 13 years ago

For bibtex, it looks like I might be able to implement this using the bibtex-ruby gem. I can also make it search not just for citekeys but for any keywords, so that

@rabbit

will suggest any articles containing the word 'rabbit' in any fields. The parsing is easy. I still struggle getting results in ruby back to vimscript.

fmoralesc commented 13 years ago

Check the changes I've made on the multibibs branch: I'm supporting the dictionary format for completion.

I've rewritten Pandoc_bibkey in python, but that was just because I'm more used to it, and can be rewritten in ruby again if needed. (Actually, the key scanning was much cleaner in ruby. Damn python's lack of non-fixed regex look behinds!)

EDIT: I found a way to simplify the regexes. The python code should be much clearer now.

dsanson commented 13 years ago

Grrr! You are pythonifiying everything. I'm much more comfortable in ruby. Of course, you're also making everything better, so I can't really complain ;-)

fmoralesc commented 13 years ago

Sorry for that! ;) But you were rubyfying everything first!

fmoralesc commented 13 years ago

BTW: what do you think is best for sorting the completion items: sorting by key or by title?

fmoralesc commented 13 years ago

I think I've tested the new features enough. If there are no comments, I'll merge the multibibs branch into master.

dsanson commented 13 years ago

Not working for me. When I try to complete a key, vim hangs for several seconds, then dumps a load of error messages.

As for sorting: I would support sorting by citekey, not title. Typically when I want to enter a citation, I know the author for certain, know the title more or less, and may or may not remember the year (and my citekeys are authoryearxx). And if we sort by title, then we have to worry about whether or not to ignore leading articles like 'the' and 'an' and the like.

fmoralesc commented 13 years ago

OK about the sorting part.

That error seems to be a problem with pybtex. What version do you have? I have 0.15 here.

fmoralesc commented 13 years ago

I pushed some changes, so the plugin will resort to the fallback procedure if pybtex fails. Can you check if it solves the problem?

dsanson commented 13 years ago

pybtex 0.15. I just installed it via pip, and haven't tried to use it aside from this. So I'll test that and get back to you.

As for the changes: yes, it no longer throws a bunch of errors. But it still hangs (for about 7 seconds) before it gives up and provides the fallback completion. And it does this every time. (By contrast, when I didn't have pybtex installed at all, on the previous version, it immediately completed via the regex.)

dsanson commented 13 years ago

Turns out I had a problem with my bibtex file: an unquoted journal title. So now it works, but it is slow. 7 seconds before the matches pop up. To be fair, I have a little over 1000 items in my database, but this is too slow to be useable.

I noticed that you are putting every matched bibfile into b:pandoc_bibfiles. And you are putting every bibfile in the working directory in there too. I don't think this is the right way to go. For one thing, it will only slow things down more (though my 7 seconds is after manually setting b:pandoc_bibfiles to only include the one file). But also, it gives me too little control. If I have a bib file with the same file name in my directory, that's probably the file I want. If I don't have a bibfile with the same file name the document directory, but I do have some other bibfile in that directory, that's probably the file I want. And if I've put a different bibfile at .pandoc/default.bib, that probably means I want to use it instead of the one in my texmf folder.

I tried completions just against the sample bibtex file from the link above, and they didn't all work. I've gisted a modified version of that file here. It looks to me that the problem is that you aren't matching citekeys that start with an uppercase letter, like

Zurek:1993
Primes

The others seem to work.

Also, can we get 'menu': ', '? Ideally, that would just be the last name of the first author...</p> <p>But these details may not matter if there isn't a way to speed things up.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I removed the pybtex dependent code and modified the procedure. Are you testing over those changes? There was a bug where the procedure only matched entries where the Title <em>tag</em> was uppercase. So</p> <pre><code>@Artitle{Bounjour, Title = {In defense of pure reason}, ... }</code></pre> <p>would be matched, but not </p> <pre><code>@Article{Bonjour, title = {...}, ... }</code></pre> <p>That is probably the reason 'menu' didn't show up in the completion. Currently, it is the title.</p> <p>I didn't modify much the code that detects the bibliographies, so it doesn't stop once it has found a suitable bibliography in a certain path. I'll check it out so it behaves better. I think you're right, except that if the user has several bibfiles in the working directory, we can assume that he wants to use them all for the current document.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>except that if the user has several bibfiles in the working directory, we can assume that he wants to use them all for the current document.</p> </blockquote> <p>I don't think so. Consider something like</p> <pre><code>papers on_what_there_is.markdown on_what_there_is.bib two_dogmas.markdown two_dogmas.bib</code></pre> <p>It seems pretty clear which bibs go with which files. But if none of the bibfiles match the filename, e.g.,</p> <pre><code>something.markdown anotherthing.markdown epistemology.bib metaphysics.bib misc.bib</code></pre> <p>then you are right: they should all be used. I'm not sure what to think, though, about someone who happens to have json file in the same directory, but it isn't a bibfile....</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Hm... I think the procedure should match:</p> <p>1) any <code>*.bib</code>,<code>*.mods</code>,<code>*.json</code>,<code>*ris</code> files named as the current file. If succesful, stop. 2) any <code>*.bib</code>,<code>*.mods</code>,<code>*.json</code>,<code>*.ris</code> in the current folder. If sucessful, stop. 3) any file named default.{bib,ris,mods,json} in the local pandoc data folder. If succesful, stop 4) any bibliography file in texmf.</p> <p>We should give the option to exclude some files if wanted. For example, I would forbid vim-pandoc to search for bibliographies in texmf.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>About the JSON issue: there is no <em>quick</em> way to determine whether a .json file is a bibliography, really. We could create some parser to determine if it is structured as a bibliography, but that seems to be overkill. Besides, what use case do you imagine where someone has .json files in the current folder for something else than this?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Okay. I guess I was a commit behind. I checked out the latest code in the multibibs branch, and pybtex is now gone.</p> <p>Still no titles on my 1000+ entry bibtex file. But it works using a file that just contains two entries copied from that bibtex file. I'll see if I can isolate the problem.</p> <blockquote> <p>I didn't modify much the code that detects the bibliographies, so it doesn't stop once it has found a suitable bibliography in a certain path. </p> </blockquote> <p>Right. The old code was written so the last detected bibliography would override all the others.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Okay. This was obvious enough. My bibtex file didn't meet the test:</p> <pre><code> if len(scanned_titles) == len(scanned_labels):</code></pre> <p>When I commented that bit out, everything worked great.</p> <p>Why are you testing for that? Not every bibtex entry needs to have a title. Some have a booktitle instead. But some might have no title at all. Note that a similar issue arises for authors: not every entry will have an author. Some will have an editor, but some will have neither. This is why it would be so much easier if we could use pybtex or citeproc-hs to do the heavy lifting for us....</p> <p>So in my perfect world, the 'menu' portion of the completion would return</p> <pre><code>Name, Title</code></pre> <p>where <code>Name</code> is the last name of the first author, or, if that doesn't exist, last name of the first editor, or, if that doesn't exist, no name is returned; and <code>Title</code> is the title (perhaps the first n words of the title for some n), or, if that doesn't exist, is <code>Booktitle</code>, or, if that doesn't exist, is the year; or, if that doesn't exist, is empty.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I'm testing for that because otherwise the titles will be misaligned. In the ideal world we could depend on pybtex, but it's choking on your system.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>Hm... I think the procedure should match:</p> <p>1) any <em>.bib,</em>.mods,_.json,<em>ris files named as the current file. If succesful, stop. 2) any </em>.bib,<em>.mods,</em>.json,_.ris in the current folder. If sucessful, stop. 3) any file named default.{bib,ris,mods,json} in the local pandoc data folder. If succesful, stop 4) any bibliography file in texmf.</p> <p>We should give the option to exclude some files if wanted. For example, I would forbid vim-pandoc to search for bibliographies in texmf.</p> </blockquote> <p>Sounds fine to me.</p> <blockquote> <p>About the JSON issue: there is no quick way to determine whether a .json file is a bibliography, really.</p> </blockquote> <p>Agreed. I don't think we should try to do this.</p> <blockquote> <p>Besides, what use case do you imagine where someone has .json files in the current folder for something else than this?</p> </blockquote> <p>Well, JSON can be used for lots of things. I use Jekyll, and that means I have markdown files sharing folders with YAML files that are used to configure how jekyll behaves. It doesn't seem a stretch that someone might have a JSON file in the same folder as a markdown folder. (Perhaps it is a bit more of a stretch to imagine this in the case of a working draft of an academic paper, but I do use citation completion sometimes for webpages too.) </p> <p>Also, pandoc can output JSON. So someone might be working on foo.markdown and have a pandoc-generated json copy at foo.json, if they had some target that took advantage of pandoc's json output.</p> <p>Not that I can see anything we can do about this.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>In the ideal world we could depend on pybtex, but it's choking on your system.</p> </blockquote> <p>Its not choking anymore, just slow. Is it fast on your system? Have you tested it against a large bibtex file?</p> <blockquote> <p>I'm testing for that because otherwise the titles will be misaligned. </p> </blockquote> <p>I see. I hadn't looked closely at how you were doing it. I don't think it can be done this way, because we can't expect every entry to have a title.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I don't have large bibtex files around, sadly (part of the problem?)</p> <blockquote> <p>I don't think it can be done this way, because we can't expect every entry to have a title.</p> </blockquote> <p>In that case, we can't expect any of the more powerful completions to be reliable as they are now, and we should drop them.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>My bibtex file is available <a href="https://github.com/dsanson/sanson.bib">here</a> if you want to play around with it. I just tested it with the minimal version (sanson-min.bib) after reverting to 9a3a999. It takes maybe 5 seconds to complete a citation here on a two year old MacBook Pro running Lion.</p> <p>If you want a real monster, you could try <a href="https://github.com/kjhealy/bib/blob/master/Philosophy.bib">this</a>.</p> <p><a href="https://gist.github.com/1201607">Here</a> is a rudimentary ruby script that returns vim dictionary style results using bibtex-ruby. It is not blazing fast either--maybe 2 or 3 seconds.</p> <p>I suppose we could offer powerful completions based upon pybtex along with an option for turning them off.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>But what about MODS, RIS and JSON files? The implementations we have are naive.</p> <p>While we research a way to handle this, I have dropped the complex completions, so we can merge the multibibs branch without bringing those issues into master.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>That seems right: multibibs support is clearly distinct from making the completion function smarter.</p> <p>I created a "smartbibs" branch that is on commit 9a3a999 --- the last commit before you removed pybtex. This is the one that works for me, but is slow. We'll have to merge in upstream changes eventually, if we decide to use it.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I think there's no need of keeping that branch separate, since we have the history of changes. It's likely that there will be changes in the completion code anyway, so merging that old code with whatever we have when we go back to this probably won't be smooth.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>More thoughts about this.</p> <h2>bibtool solution</h2> <p><a href="http://www.gerd-neugebauer.de/software/TeX/BibTool/bibtool.pdf">bibtool</a> is very fast, and can extract a set of bibtex entries based upon a regex, e.g.,</p> <pre><code>bibtool -X "geach" big.bib -o small.bib</code></pre> <p>or, if you just want to search the citekeys,</p> <pre><code>bibtool -- 'select{$key "geach"}' big.bib -o small.bib</code></pre> <p>or to search selected fields,</p> <pre><code>bibtool -- 'select{title booktitle author editor $key "geach"}' big.bib -o small.bib</code></pre> <p>It has a bunch of other options that allow several input files, control sorting, detect duplicates, etc.</p> <p>So we could use bibtool to get a small.bib file that contains exactly the entries we want to offer for completion, and then use pybtex or bibtex-ruby to parse that file for key, author, title.</p> <h2>sloppy regex solution</h2> <p>Parsing bibtex properly with regexes requires recursive matching of paired brackets. But a much cruder strategy is to just look for lines that start with "@", and assume that everything between "@"s is a single entry. This isn't quite right (there can be stuff between bibtex entries that bibtex is supposed to ignore), but it might be close enough. Once we have an array of chunks of text between "@"s that match a given regex, we ought to be able to use regexes to search for citekey, title, author, booktitle, and editor within those chunks.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I'm in favor of the sloppy regex solution. I've tested the approach (<a href="https://gist.github.com/1203698">https://gist.github.com/1203698</a>), and it is much faster than what we had: for sanson.bib, parsing the file takes around 0.08 seconds. For philosophy.bib, it takes around 1.1 seconds.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I think we should only retrieve titles for the value of <code>menu</code>. First, because the less regex searches we make, the faster it goes. The procedure I have (see the gist) takes ~0.01 seconds to traverse "sanson.bib" when the query is "lew" (which gives 68 results). Second, because if the ids are formatted in authoryear format, author info is redundant. Third, because I think that editor information is (sadly) never something one needs to know.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Neat! </p> <p>For comparison purposes, here is a test of the bibtool approach, providing author (or editor), title (or booktitle) (and doing some work to clean up titles):</p> <p><a href="https://gist.github.com/1203906">https://gist.github.com/1203906</a></p> <p>On my system, running this inside of <code>time</code> on 'lew' gets me:</p> <pre><code>real 0m0.423s user 0m0.338s sys 0m0.071s</code></pre> <p>While running yours on 'lew' gets me:</p> <pre><code>real 0m0.084s user 0m0.059s sys 0m0.020s</code></pre> <p>So yes, the sloppy method is faster. But the bibtool method is pretty fast too.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Oops. Just realized that your version parses the bibfile twice. So those numbers above are about twice what they should be.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>bibtool's contribution to the time:</p> <pre><code>real 0m0.077s user 0m0.071s sys 0m0.005s</code></pre> <p>In fact, if I put an exit command in the ruby script right after the <code>require 'bibtex'</code> line, I get</p> <pre><code>real 0m0.272s user 0m0.199s sys 0m0.057s</code></pre> <p>So almost all of the time taken by the script is taken up loading the gem.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <blockquote> <p>if the ids are formatted in authoryear format, author info is redundant. </p> </blockquote> <p>True. But if not?</p> <blockquote> <p>Third, because I think that editor information is (sadly) never something one needs to know.</p> </blockquote> <p>Editor matters when you are citing a collection (rather than something in a collection), or a specific edition of a classic. In the first case, presumably if your cite keys are author:year, the citekey will be editor:year. The second case isn't something we'd be supporting anyway, since in that case, editor would be trumped by the author.</p> <p>These choices make little difference to total time on the bibtool/bibtex-ruby or bibtool/pybtex approach, but I can see that they matter on the regex approach.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>One thing I like about the regex approach here: no external dependencies.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>More data points. (Note that the name for my test script is bibvim, and I've modified it from the gist so that it just outputs the number of entries, rather than a string representation of them.)</p> <p>Running against sanson.bib:</p> <pre><code>$ time bibvim lew 78 real 0m0.410s user 0m0.337s sys 0m0.070s</code></pre> <p>Note that I am getting 78 hits. Presumably that's because I'm searching for citekey, author, title, editor, booktitle. I played around with these options to bibtool, but it made no appreciable difference to the total time taken by the script.</p> <p>Running against philosophy.bib</p> <pre><code>$ time bibvim lew 107 real 0m0.716s user 0m0.634s sys 0m0.078s</code></pre> <p>My sense is that this approach will scale well (though really, do we need to worry about bibfiles any bigger than this?) The main performance penalty is loading the gem. After that, things are quite speedy.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Actually, my script runs the procedure thrice. The correct output of time for sanson.bib is:</p> <pre><code>real 0.06 user 0.05 sys 0.01</code></pre> <p>and </p> <pre><code>real 0.10 user 0.08 sys 0.02</code></pre> <p>for Philosophy.bib</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Fast indeed. Am I right that you are only searching for matches in citekeys?</p> <p>I'm convinced that we should go your way. If shortcomings arise, we can always revisit the bibtool/bibtex-ruby or bibtool/pybtex solution.</p> <p>Looks like a similar process should work fine for RIS (split by <code>/^ER -/</code>) and MODS (split by <code>/<mods>/</code>. Not sure about JSON.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Yes, I am only searching for matches in citekeys.</p> <p>I'm working on searching for matches in other tags too. I'm trying to plug bibtool into the regex mini parser now.</p> <p>I'm sure a similar process can work for RIS (I made a parser for it yesterday night). For MODS, I would prefer to use a proper XML parser. For JSON we should use a parser too; python's is very fast.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I plugged the regex procedure with bibtool: <a href="https://gist.github.com/1204775">https://gist.github.com/1204775</a></p> <p>For sanson.bib:</p> <pre><code>real 0.20 user 0.17 sys 0.02</code></pre> <p>For Philosophy.bib:</p> <pre><code>real 0.70 user 0.66 sys 0.04</code></pre> <p>This is searching for the query on $key, title, booktitle, author and editor.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>You might want to check this out: <a href="http://www.youtube.com/watch?v=0ux6koT-U_U">http://www.youtube.com/watch?v=0ux6koT-U_U</a> (preferably in 720 and full screen).</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>That is pretty sweet, sir. I am impressed. </p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I pushed the new completions code into new-completions, and deleted the smartbibs branch. The changes include some file reorganization (the methods that handle bibliographic suggestions are now in autoload/pandocbib.vim). It needs some cleaning (there's some code duplication), but it works. I have experimental support for using bibtool too.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Good stuff.</p> <p>With g:pandoc_use_bibtool set, I get errors. For example, working against sanson.bib, <code>@lew<C-X><C-O></code> gets me:</p> <pre><code> Error detected while processing function pandoc#Pandoc_Complete..pandocbib#Pando cBibSuggestions: line 30: Traceback (most recent call last): Error detected while processing function pandoc#Pandoc_Complete..pandocbib#Pando cBibSuggestions: line 30: File "<string>", line 20, in <module> Error detected while processing function pandoc#Pandoc_Complete..pandocbib#Pando cBibSuggestions: line 30: File "<string>", line 89, in pandoc_get_bibtool_suggestions Error detected while processing function pandoc#Pandoc_Complete..pandocbib#Pando cBibSuggestions: line 30: IndexError: no such group Error detected while processing function pandoc#Pandoc_Complete: line 22: E706: Variable type mismatch for: suggestions</code></pre> <p>The same occurs for <code>@l</code>, <code>@le</code> ... <code>@lewis</code>. But <code>@lewis1</code> suddenly works. Likewise, <code>@h</code> ... <code>@hinc</code> get the error, but <code>@hinch</code> works...</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I confirm. For the bibtool code I mostly copied what I had in the standalone script, so it is essentialy a stub.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>I just fixed that issue on commit 69c5e77f5d7730f35651c05c9ae6602ef265dfed.</p> <p>I'm having a problem where it won't retrieve some names and titles. What's weird about it is that the code for that is the same as the one I'm using in the non-bibtool-based method, which works fine.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>By the way, I am coming to agree that we shouldn't show the author in the 'menu' part. I like it when it is just the last name of the first author, but when it becomes the full name of multiple authors, its too much. And I don't think you should be trying to parse bibtex name fields to find the last name of the first author (ugh).</p> <p>Do you have examples of cases in which names and titles aren't working for you?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>In sanson.bib, for example, "cohen2005" doesn't retrieve neither the name nor the title. There are many examples, even if you just try to complete after the @.</p> <p>An yes, names are complicated for those reasons.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>The common thread is that these are all cases where the author or the title, once processed by bibtool, has a linebreak within it. For example,</p> <pre><code>bibtool -- 'select{title booktitle author editor $key "cohen"}' ~/.pandoc/default.bib</code></pre> <p>Gets you (trimming away all the Bibdesk crap):</p> <pre><code>@Book{ cohen2005, author = {Cohen, S. Marc and Curd, Patricia and Reeve, C. D. C. and Cohen, S Marc}, date-added = {2008-02-12 23:20:43 -0500}, date-modified = {2010-11-03 17:19:28 -0400}, edition = {3rd}, isbn = {0872207692}, pages = {958}, publisher = {Hackett Publishing}, title = {Readings in Ancient Greek Philosophy: From Thales to Aristotle}, year = {2005}, bdsk-url-1 = {http://books.google.com/books?id=XVHj_gwk39QC} }</code></pre> <p>Here both title and author contain a linebreak. copelston1953 has an author but no title. Testing it we see that</p> <pre><code>bibtool -- 'select{title booktitle author editor $key "copleston1953"}' ~/.pandoc/default.bib</code></pre> <p>(again trimming the crap):</p> <pre><code>@Book{ copleston1953, address = {New York}, author = {Copleston, Frederick}, date-added = {2007-11-28 20:49:38 -0500}, date-modified = {2010-11-03 17:19:28 -0400}, number = {1}, publisher = {Newman}, series = {A History of Philosophy}, title = {Late Mediaeval and Renaissance Philosophy: Ockham to the Speculative Mystics}, volume = {3}, year = {1953}, bdsk-file-1 = {YnBsaXN0MDDUAQIDBAUIJidUJHRvcFgkb2JqZWN0c1gkdmVyc2lvblkkYXJjaGl2ZXLRBgdUcm9vdIABqAkKFRYXGyIjVSRudWxs0wsMDQ4RFFpOUy5vYmplY3RzV05TLmtleXNWJGNsYXNzog8QgASABqISE4ACgAOAB1lhbGlhc0RhdGFccmVsYXRpdmVQYXRo0hgNGRpXTlMuZGF0YU8RAaAAAAAAAaAAAgAACU1hY2ludG9zaAAAAAAAAAAAAAAAAAAAAAAAAMeJGgxIKwAAADKayRJjb3BsZXN0b24xOTUzLmh0bWwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB2r/wqRp2gAAAAAAAAAAAAEAAgAACSAAAAAAAAAAAAAAAAAAAAADYmliAAAQAAgAAMeJYFwAAAARAAgAAMKkohoAAAABABQAMprJADKavgAFujEABbokAACRoQACAENNYWNpbnRvc2g6VXNlcnM6AGRhdmlkOgBEb2N1bWVudHM6AERyb3Bib3g6AGJpYjoAY29wbGVzdG9uMTk1My5odG1sAAAOACYAEgBjAG8AcABsAGUAcwB0AG8AbgAxADkANQAzAC4AaAB0AG0AbAAPABQACQBNAGEAYwBpAG4AdABvAHMAaAASADRVc2Vycy9kYXZpZC9Eb2N1bWVudHMvRHJvcGJveC9iaWIvY29wbGVzdG9uMTk1My5odG1sABMAAS8AABUAAgAM//8AAIAF0hwdHh9YJGNsYXNzZXNaJGNsYXNzbmFtZaMfICFdTlNNdXRhYmxlRGF0YVZOU0RhdGFYTlNPYmplY3RfEBJjb3BsZXN0b24xOTUzLmh0bWzSHB0kJaIlIVxOU0RpY3Rpb25hcnkSAAGGoF8QD05TS2V5ZWRBcmNoaXZlcgAIABEAFgAfACgAMgA1ADoAPABFAEsAUgBdAGUAbABvAHEAcwB2AHgAegB8AIYAkwCYAKACRAJGAksCVAJfAmMCcQJ4AoEClgKbAp4CqwKwAAAAAAAAAgEAAAAAAAAAKAAAAAAAAAAAAAAAAAAAAsI=} , bdsk-url-1 = {http://books.google.com/books?id=m3ItKgAACAAJ} }</code></pre> <p>So I'm pretty sure its the line breaks that are causing you trouble. You could match the whole title remove the linebreaks. Or you could just match to the end of the line. This would in effect give us a quick and easy way to truncate titles to a reasonable length, assuming bibtool does this in a consistent way.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/dsanson"><img src="https://avatars.githubusercontent.com/u/14331?v=4" />dsanson</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>There are various settings that can be used to fine tune bibtool's ouptut. See p. 22 of <a href="http://www.gerd-neugebauer.de/software/TeX/BibTool/bibtool.pdf">the manual</a>.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>Hm... I need to fix the regex, then.</p> <p>I think we should truncate titles in postprocessing, because a naive method won't be ideal. Think in the case where the titles of two entries (let's say, "dude1955a" and "dude1955b") are very long but only differ in the last words ("part 1", "part 2") (I'm sure we've all seen titles like that IRL). I would prefer removing text in the middle, but that might look wrong. Detecting those problematic cases could slow us down (we have plenty of room, I think, though).</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/fmoralesc"><img src="https://avatars.githubusercontent.com/u/221465?v=4" />fmoralesc</a> commented <strong> 13 years ago</strong> </div> <div class="markdown-body"> <p>We could also try to forbid bibtool to break lines when it reaches <code>print.line.length</code> (actually, by setting it to a very large number). However, that won't help if we are not using bibtool, where we can find this problem too.</p> </div> </div> <div class="page-bar-simple"> <a href="/vim-pandoc/vim-pandoc-legacy/13?page=2" class="next">Next</a> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>