Closed jeanetteclark closed 1 month ago
Thank you @jeanetteclark --- I will need to think about it, but certainly won't just close the PR and dismiss it! :wink:
I think perhaps a better way to do this may be to create an ancillary function (say, fix_bib
, or something) that takes a non-standard bib
file and reformats in the way you suggest? That way, the user would be encouraged to use a standardised bib
file (with entries separated by carriage returns after the customary comma), but we specify a wider set of formats that can be worked in, using the utility function?... What do you think?
[UPDATE]: after browsing through your code, I think some of it is very elegant and indeed perhaps more efficient than the original one... (I thought that was pretty clever... though I should say I did not write it and only came into this project quite late as a make-shift maintainer, after I too made a change to fix a bug in how it was working for my own project...).
What about allowing the user an option to add fields that then get added to the other_allowed_fields
variable? Like: could you have an extra option in bib2df
, say something like: extra_fields
, which the user can specify as a vector of strings (the names of the extra fields they want to include).
Say, I include
`bib2df(..., extra_fields=c("project","scopus")
(assume that some of my bib
files have a field project
, where I specify the project to which a given paper is related and a field scopus
, with the URL of my Scopus page, for CV building, or something...).
Now, if the helpers added these to the variable other_allowed_fields
, then all should work OK? I think the issue is to make sure that the main function and the helpers communicate and the other_allowed_field
is updated with the user-selected ones?
I've created a new branch devel
, where I've forked your own PR. If you make further changes, can you push to that branch, so we can test without breaking main
?
Thanks!
Thanks for having a look! Your solution sounds very nice actually, I think it gives us a nice middle ground between being able to parse documents with strange formatting and allowing any field as long as the user provides it. Thanks for moving us over to develop - very happy to switch to a new branch given the scope of the changes
This is in response to #56 and is a pretty significant refactor in the parsing. I definitely did not think things would go as far as they did, but I needed a solution to my problem so here we are.
I've essentially changed things such that each entry is pushed into a single line, and then key-value pairs are parsed according to list of allowed keys (which consists of the standard bibtex + stuff I've been seeing in the wild). I think overall this approach might be more robust, but I really am not sure how flexible we should be about the allowed keys. All of the tests pass at least. I don't expect this PR to be merged as is, since at a minimum we should add a message if a non-allowed value is found and what the value is, and in general that section could probably be handled more robustly.
I opened the PR to see if there was interest in pursing this change, I will not be offended if it is closed outright. Like I said, I just need to get my project working again, and could potentially clean this up to merge if there is interest.
This PR is brought to you by: