dieghernan commented 2 years ago

Hi

Following #28 find here a proposed roadmap for improving the package. This is to be discussed:

Phase 1: Current state

Add test ideally based in snapshot (testthat >= 3):

Tests

[x] Move to testthat > = 3
[x] Improve current testing, including test for checking already closed issues.
[x] Add separated test for advanced features on bibtex (crossref, use of STRING)

Phase 2: Alternative solution
[x] Implement and Test alternative solution on separate branch

Phase 3: Agree on changes

[x] Decide whether the changes should be applied, iterate
[x] Check reverse dependencies {revdep}
[ ] Merge on main
[ ] Bump to a major version
[ ] CRAN release

Nice to have

[ ] Lint {lint}
[ ] Cleanup code (https://devguide.ropensci.org/building.html) with {styler}
[ ] Use Roxygen + md {roxygen2md}
[ ] Recreate NEWS.md from https://github.com/ropensci/bibtex/blob/master/inst/NEWS.Rd
[ ] I would probably rearrange code on separate files, to make it easier to maintain and debug. Now everything is in bibtex.R

Happy to get feedback on this

dieghernan commented 2 years ago

More on non-so-common features

http://www.bibtex.org/Format/
- https://maverick.inria.fr/~Xavier.Decoret/resources/xdkbibtex/bibtex_summary.html
- http://tug.ctan.org/info/bibtex/tamethebeast/ttb_en.pdf

coatless commented 2 years ago

@dieghernan I think one huge component that is missing from Phase 2 is addressing the underlying memory leaks. Or is this what "alternative solutions" means?

I should note that work was initially started by @matthewwiese in #40.

matthewwiese commented 2 years ago

I did start work on fixing the rchk warnings some months back and made good progress, although I never updated the PR because I wasn’t able to eliminate possible protection stack imbalance completely. A lot of the warnings in #33 can be mitigated by avoiding UNPROTECT_PTR(s) in favor of respecting the stack via UNPROTECT(n), in exchange for slightly more verbose logic in the grammar section and elsewhere. Despite these improvements, protection stack imbalance warnings would persist. My initial hunch was that it was merely a false positive and that rchk was getting "confused" by the generated parser - from rchk USAGE:

There are still some false alarms and bailouts for code that is still too complicated, and indeed there are not-useful reports for functions that have protection stack imbalance by design.

However, my changes still didn’t solve problems like #42, which suggests something more fundamental. I can update the PR with what I have if it'll be of benefit, but unfortunately I don't have the time to devote the proper attention to this.

I agree that any project to improve the package is moot unless the memory bugs are addressed.

dieghernan commented 2 years ago

Thanks both for your comments. For clarity, what I am talking about in Phase 2 is to rewrite the package using base R rather to fix the current one, i.e. dropping the C code.

This is kind of risky, so it is important to have a full battery of tests covering the most extreme cases.

I am aware that Phase 2 may not be satisfactory, but even in the case you decide not to adopt it (it would be fine for me) I think it is worth to try

matthewwiese commented 2 years ago

For what it's worth, I think rewriting in R is a good idea. It eases future maintenance and allows existing users of the package to keep on working with only a minor performance penalty. If for some reason the parsing of BibTeX files is a bottleneck, one could always switch to rbibutils which appears to roll a hand-written C parser.

I don't know whether you're a student @dieghernan, but this could make for a great GSoC project if @coatless is up for it.

coatless commented 1 year ago

Closed as #47 is now in and the package is back on CRAN.

ropensci / bibtex

Proposal: Improving the package #45

Phase 1: Current state

Tests

Phase 2: Alternative solution

Phase 3: Agree on changes

Nice to have