Closed j-steinbach closed 3 years ago
Hi, thank you for your request. That's definitely not too much and should be relatively easy to achieve. I will look into it over the weekend.
It's certainly scrapper. In the sense it harvests scraps left from scraping a PDF by AnyStyle and takes them to the Org Roam junkyard.
Awesome! And yes, that sounds exactly like what I am doing - putting stuff in the junkyard and then building a rocket! :rocket:
It seems to be done. Please check the pull request #136 , or pull the develop
branch. If it's working for you, I'll merge it into master
.
Also note that depending on the PDF page layout, it is not a rare occasion that references order will be messed. It's especially true for two-columns PDF. There's not much ORB can do about it.
You are very fast!
Unfortunately I seem to have come across some small hiccup on my side...
Based on my (small) understanding of Doom Emacs, doing either
(package! org-roam-bibtex
:recipe
(:host github
:repo "org-roam/org-roam-bibtex"
:branch "feature-134"))
or :branch "develop"
and then syncing/building/upgrading Doom should do the job, but now I can't seem to find neither orb-pdf-scrapper-sort-references
nor orb-pdf-scrapper-export-fields' nor
orb-pdf-scrapper-refsection-headings`.
It says
orb-pdf-scrapper-sort-references is a variable without a source file.
Do you perhaps have an idea?
Ok, it appears to work. I reinstalled Doom and brute-forced my way through it. The problem probably came from me using native comp Emacs. But who knows.
Anyways, I tested it with three different PDFs and everything is in a list, in the same order as the "text mode" buffer.
There are two more things I noticed during testing. (They might warrant their own feature requests)
Is there a way to also save/export the "Text mode" and "BibTeX mode" buffers?
Would it be possible to also put them into a heading in the origin file? Similar to how it puts the "Org mode" references into a heading "References (retrieved by ORB...).
Secondly, In the "BibTeX mode" buffer, I have a reference like this
@misc{gartner2013,
citation-number = {6},
author = {Gartner},
date = {2013},
title = {Gartner’s 2013 hype cycle for emerging technologies}
I change it to
@misc{gartner2013gartner,
citation-number = {6},
author = {Gartner},
date = {2013},
title = {Gartner’s 2013 hype cycle for emerging technologies}
and expect it to turn into - cite:gartner2013gartner
, but in the generated "Org buffer" it turns into - cite:gartner2013
again.
Sorry for the trouble :)
The problem probably came from me using native comp Emacs.
Quite likely. I have little experience with native comp though. So should I merge the branch into master?
There are two more things I noticed during testing. (They might warrant their own feature requests)
Indeed, these should be separate feature requests, which you are welcome to file.
Is there a way to also save/export the "Text mode" and "BibTeX mode" buffers?
As a temporary solution, navigate to orb--temp-dir
and locate your files there. The directory persists until Emacs restart.
Secondly, In the "BibTeX mode" buffer, I have a reference like this...and expect it to turn into - cite:gartner2013gartner
Don't press y
in the prompt suggesting to generate keys before proceeding to Org mode if you have manually edited the BibTeX buffer. I typically press C-c C-u
to automatically generate all the keys, then go through the buffer and manually edit a few of them, perhaps also automatically re-generating several other with C-u C-c C-u
. Then after pressing C-c C-c
, I type n
in the prompt and proceed to the Org mode - the keys are exactly what I want them to be, so I don't need to generate them once again. I know this is counterintuitive, therefore you are welcome to make suggestions on improving this workflow in a feature request.
Sorry for the trouble :)
Not at all :) ORB PDF Scrapper is quite raw still. It basically works, but can be improved much in terms of user experience. For I know how it works, I'm fine with it, others will likely run into different kinds of issues. For example, I wanted to implement a transparent save/export mechanism, but since I don't require it too often, I didn't bother to. So if you are willing to contribute ideas and participate in testing, I'll be glad to implement them.
From my side you can close/merge.
I tested with a few different PDFs: Each of those resulted in a single list filled with -cite:key
references. As far as I observed, they kept the order they had in the "Text mode" buffer. Note: I did not test if the old sort still works.
It seems that I have been pressing y
in that buffer.
I will create a few feature requests for you :)
Great!
As mentioned in https://github.com/org-roam/org-roam-bibtex/pull/44, the PDF Scrapper extracts keys from PDFs and then sorts them into
in-roam
,in-bib
,valid
andinvalid
.I would like it to not sort the keys, but instead simply keep the structure of the list of extracted references.
I looked through the available variables in Emacs but didn't see anything like
orb-pdf-scrapper-sort t
, so I guess it always sorts at the moment.In "text mode" it shows me the extracted list in the form
But after "org mode" it sorts and turns the references into
I would like it to just keep it as
Why?
I extract my PDF annotations with
org-noter
. This results in text in the formThen I replace all mentions of [1] with the corresponding key. (With a macro or manually. Often I also create a org-roam note.)
For this I need to know that [1] references coolguy2020fun. If the extracted references get sorted, I have to manually "unsort" them.
Ideally, the PDF Scrapper would return me the references in the form
But I don't want to ask for too much :)
Overall, the PDF Scrapper is an awesome and very helpful feature, thank you very much for creating it!
Is it scrapper or scraper?