jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.76k stars 568 forks source link

Latex export with citations fails #223

Open nlgranger opened 8 years ago

nlgranger commented 8 years ago

I have tried to use the support for citations as shown here, unfortunately, it does not work. Long story short: the bibliography file is not generated and not available in the build directory.

By default, nbconvert calls pdflatex three times but does not call bibtex so the bibliography is not created. I have modified the converter command to fix this:

_~/.jupyter/jupyter_nbconvertconfig.py

c.PDFExporter.latex_command = ['latexmk', '-bibtex', '-pdf', '{filename}']
c.PDFExporter.latex_count = 1

Minimal working example:

_Testcitation.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Test citation: <cite data-cite=\"cappe_use_2005\">[Cappé 2005]</cite>."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}

citation.tplx

((*- extends 'article.tplx' -*))

((* block bibliography *))
\bibliographystyle{apalike}
\bibliography{bibliography}
((* endblock bibliography *))

bibliography.bib

@inproceedings{cappe_use_2005,
    title = {On the use of particle filtering for maximum likelihood parameter estimation},
    url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7078114},
    urldate = {2016-01-06},
    booktitle = {Signal {Processing} {Conference}, 2005 13th {European}},
    publisher = {IEEE},
    author = {Cappé, Olivier and Moulines, Eric},
    year = {2005},
    keywords = {Maximum likelyhood, Parameter estimation, Particle filter},
    pages = {1--4},
    file = {Cappé and Moulines (2005) - On the use of particle filtering for maximum likel.pdf:/home/granger/boulot/these/papers/Statistics/Cappé and Moulines (2005) - On the use of particle filtering for maximum likel.pdf:application/pdf}
}

If I run the latex commands manually, the pdf is correct:

ipython3 nbconvert --debug --to latex Test_citation.ipynb --template citations.tplx
latexmk -bibtex -pdf Test_citation.tex

However, if I convert directly to latex, it fails because latex commands are run in a temporary directory where the bibliography is missing:

ipython3 nbconvert --debug --to pdf Test_citation.ipynb --template citations.tplx
...
Latexmk: Failed to find one or more bibliography files 
  bibliography.bib
...

Suggested fix or improvement:

Regards,

takluyver commented 8 years ago

How standard is latexmk? From a glance at the man page, it looks like using that is a smarter way of building Latex than the 'run it a few times and cross our fingers' approach. But can we rely on users with Latex having latexmk as well?

nlgranger commented 8 years ago

I'm afraid it is usually not part of the standard texlive installation (checked debian and fedora), although it is always provided as a package. On archlinux, it is provided by texlive-core. (edit: accidentially clicked close instead of comment, sorry)

nlgranger commented 8 years ago

Instead of relying on hair-tearing dependency tracking, it would be easier (probably) to add the bibliography at the end of the notebook itself and keep the calls to the standard pdflatex. Apparently, latex supports inlining files in the main document: http://tex.stackexchange.com/questions/140360/inline-bibliography

Maybe nbconvert could provide a nicer wrapper around this?

takluyver commented 8 years ago

This is more general - references to external images also break with --to pdf, because they're not in the working directory where Latex is running. :-(

takluyver commented 8 years ago

I think we probably have to switch back to running latex in the working directory, and then trying to clean up after it.

In principle, we could do something neat with an overlay filesystem, so latex can see the files it needs to, but anything it tries to write is transparently diverted to another location. But there's no way that's going to be reliable across different platforms.

fperez commented 8 years ago

I think the cleanup approach isn't so bad, while Latex generates a lot of junk, if I remember correctly the are all consistently named.

Ages ago I wrote this (crude) little cleantex script that I carry around in my personal ~/usr/bin:

#! /bin/sh
# script to clean up a directory of auxiliary files made by tex
file=$1
/bin/rm -f ${file}.dvi ${file}.log ${file}.aux ${file}.ps ${file}.idx
/bin/rm -f ${file}.*~ ${file}.lof ${file}.toc  ${file}.lot ${file}.nav
/bin/rm -f ${file}.snm ${file}.out ${file}.bbl ${file}.blg ${file}.vrb
/bin/rm -rf ${file}_files

A cleaner, Python version of that, validated with a check for such files existing before the conversion run (so we leave temporaries in place if they were there before for some reason), might just do the trick...

karldw commented 8 years ago

Thank you all so much for working on LaTeX integration! I'm very grateful it's an option in nbconvert.

It's worth mentioning that there are a bunch of other LaTeX extensions that can pop up. This stackoverflow question lists of many of them: http://tex.stackexchange.com/q/17845

Here's some code that begins to follow @fperez's suggestion:


#!/usr/bin/env python3

import os
from sys import argv
from subprocess import call

# Copied from http://tex.stackexchange.com/q/17845 but without 'pdf' and 'tmp'
latex_extensions = {'acn', 'acr', 'alg', 'aux', 'bbl', 'blg', 'dvi',
'fdb_latexmk', 'glg', 'glo', 'gls', 'idx', 'ilg', 'ind', 'ist', 'lof', 'log',
'lot', 'maf', 'mp', 'mtc', 'mtc1', 'nav', 'nlo', 'out', 'pdfsync', 'snm',
'synctex.gz', 'toc', 'top'}

def find_latex_byproducts(file_base):
    potential_byproducts = [file_base + '.' + ext for ext in latex_extensions]
    byproducts = [f for f in potential_byproducts if os.path.isfile(f)]
    return set(byproducts)

def find_new_byproducts(file_base, existing_byproducts):
    current_byproducts = find_latex_byproducts(file_base)
    return current_byproducts - existing_byproducts

def try_to_clean(file_set):
    for f in file_set:
        try:
            os.remove(f)
            print(f, 'cleaned')
        except PermissionError:
            pass

def remove_tex_extension(filename):
    if filename.endswith('.tex'):
        return filename[: -4]
    else:
        return filename

if __name__ == '__main__':

    # This assert should be something else.
    assert len(argv) == 2

    # Detect the tex file
    filename = argv[1]
    assert os.path.isfile(filename)
    file_base = remove_tex_extension(filename)
    pre_files = find_latex_byproducts(file_base)
    # Compile LaTeX with the existing process.
    call(['touch', file_base + '.log'])

    new_crud = find_new_byproducts(file_base, pre_files)

    try_to_clean(new_crud)
smcateer commented 7 years ago

Made some comments about this issue here before seeing this.

Would be good to see this assigned. (I'd hop in, but an nowhere near expert enough!)

ischoegl commented 6 years ago

I started looking into nbconvert, specifically in combination with citations, which appears to be tremedously useful (thanks!). It looks like I stumbled into the same issue reported above with a recent anaconda installation (anaconda3-5.2.0).

$ jupyter --version
4.4.0
$ jupyter nbconvert --version
5.3.1

Using the reference nbconvert example LifeCycleTools, and issuing

$ jupyter nbconvert --debug --config ipython_nbconvert_config.py 

produces

...
This is BibTeX, Version 0.99d (TeX Live 2017)
The top-level auxiliary file: notebook.aux
The style file: unsrt.bst
I couldn't open database file ipython.bib
---line 32 of file notebook.aux
 : \bibdata{ipython
 :                 }
I'm skipping whatever remains of this command
I found no database files---while reading file notebook.aux
...

I tried setting temporary environment variables, but to no avail - the file ipython.bib most definitely exists within the given folder, but latex appears to be running in a separate folder. Here is the summary of jupyter's paths on my (Fedora 28) installation ...

$ jupyter --paths
config:
    /home/ischg/.jupyter
    /home/ischg/.pyenv/versions/anaconda3-5.2.0/etc/jupyter
    /usr/local/etc/jupyter
    /etc/jupyter
data:
    /home/ischg/.local/share/jupyter
    /home/ischg/.pyenv/versions/anaconda3-5.2.0/share/jupyter
    /usr/local/share/jupyter
    /usr/share/jupyter
runtime:
    /run/user/1000/jupyter

Any help would be appreciated.

chrisjsewell commented 5 years ago

@ischg shameless plug, but I have actually 'solved' this issue with ipypublish :) In particular, I've implemented a LatexDocLinks preprocessor to resolve the relative locations of files.

mgeier commented 5 years ago

While we are in shameless-plug-mode: you can also use Sphinx with my little nbsphinx extension, which also supports LaTeX/PDF output and BibTeX citations: https://nbsphinx.readthedocs.io/en/0.4.2/markdown-cells.html#Citations (link to PDF: https://media.readthedocs.org/pdf/nbsphinx/0.4.2/nbsphinx.pdf#subsection.3.2).

ischoegl commented 5 years ago

Thanks to both of you for the suggestions. I appreciate it!