sagemathinc / cocalc

CoCalc: Collaborative Calculation in the Cloud
https://CoCalc.com
Other
1.17k stars 216 forks source link

Serious loss of data with bibtex code formatter. #8019

Open mforbes opened 3 days ago

mforbes commented 3 days ago

Describe the bug

Running the Format Code Source command on CoCalc results in serious loss of data in bibtex files.

I recommend the following

  1. That Format Source Code be disabled by default unless the formatter has had a review and is found to be safe.
  2. That the available code formatters be documented on https://doc.cocalc.com/.
  3. Provide a way to disable auto-formatting for specific projects.
  4. Provide a way to customize the formatters (harder... but probably doable if they were clearly documented since they probably use some config files.)

Of course, one can use Time Travel to recover, but in our case it was quite painful because of many intermediate changes.

To Reproduce

  1. Create and open the following file called test.bib on CoCalc.
%% This BibTeX bibliography file was created using BibDesk.
%% https://bibdesk.sourceforge.io/
%% Saved with string encoding Unicode (UTF-8) 

@set{myset,
    entryset = {Beringer:2024}}

@article{Beringer:2024,
    archiveprefix = {arXiv},
    author = {Beringer, Lukas and Steinhuber, Mathias and Diego Urbina, Juan and Richter, Klaus and Tomsovic, Steven},
    doi = {10.1088/1367-2630/ad5752},
    eprint = {2401.17744},
    issn = {1367-2630},
    journal = njp,
    month = jul,
    number = {7},
    pages = {073002},
    primaryclass = {cond-mat.quant-gas},
    publisher = {IOP Publishing},
    title = {Controlling many-body quantum chaos: {Bose}-{Hubbard} systems},
    url = {http://dx.doi.org/10.1088/1367-2630/ad5752},
    volume = {26},
    year = {2024}}
  1. Run Format/Format Source Code. The file becomes:
@set{myset,
}

@article{Beringer:2024,
  author      = {Beringer, Lukas and Steinhuber, Mathias and Diego Urbina, Juan and Richter, Klaus and Tomsovic, Steven},
  publisher   = {IOP Publishing},
  url         = {http://dx.doi.org/10.1088/1367-2630/ad5752},
  date        = {2024-07},
  doi         = {10.1088/1367-2630/ad5752},
  eprint      = {2401.17744},
  eprintclass = {cond-mat.quant-gas},
  eprinttype  = {arXiv},
  issn        = {1367-2630},
  number      = {7},
  pages       = {073002},
  title       = {Controlling many-body quantum chaos: {Bose}-{Hubbard} systems},
  volume      = {26},
}

Note that:

  1. Comments are lost.
  2. Entryset data is completely lost.
  3. Macros are lost - in this case, the journal field is just dropped. (It is a macro that is defined in another file).
  4. Field names are changed from bibtex-compatible names like journal, year, etc. and arXiv-compatible names like archiveprefix to biblatex compliant names: journaltitle, data, eprinttype etc. This might be find for people using biblatex, but will break submissions to journals that still require bibtex.

Expected behavior Inconsequential code formatting changes such as spaces etc.

williamstein commented 3 days ago

Closing related issue: https://github.com/sagemathinc/cocalc/issues/4215

I think we should just get rid of all latex-related formatting. I don't think there exist any good formatters.

The formatters for code (python, javascript, etc.) are very robust these days. The formatters for tex-related things are very bad.