retorquere / zotero-better-bibtex

Make Zotero effective for us LaTeX holdouts
https://retorque.re/zotero-better-bibtex/
MIT License
5.2k stars 284 forks source link

Export only clean entries to bibtex file #1826

Open fab6 opened 3 years ago

fab6 commented 3 years ago

Hi,

I am using zotero with a quite large database and not all entries have all information needed for bibtex E.g. I have quite a few messages like:

% == BibLateX quality report for 00IntroScan:
% Unexpected field 'publisher'
% Exactly one of 'date' / 'year' must be present
% ? Title looks like it was stored in lower-case in Zotero

Is there an option to export only clean and correct entries? Or is it possible to mark entries for export? Then I could select the clean ones manually for export...

Thank you in advance! Fab

fab6 commented 3 years ago

It seems that the support log is not sent...nothing happens on macos image

Support log ID:

retorquere commented 3 years ago

That's #1821; a fix will be out within the hour.

retorquere commented 3 years ago

The quality report is generated at the very end on the resulting bib(la)tex, and the transformation from Zotero -> bib(la)tex is complex enough that you can't realistically predict the QR.

Marking the entries that have a QR... that is an interesting idea. Tricky though -- with auto-exports on, there's the risk of an export loop, as tagging counts as an item-change in Zotero, and plugins are not told about the nature of a change.

retorquere commented 3 years ago

Hmm... I have an idea that might work.... #1821 needs to go out first though.

fab6 commented 3 years ago

Ok, Thank you very much for your reply!

qqobb commented 3 years ago

I've used saved searches and collections to prevent exporting items without a date or creators. (But I haven't used quality reports much.) It would be great if one could prevent exports, e.g., through a BBT postscript!

@retorquere: Here are some thoughts about this issue.

Working with quality reports

I'm not sure if it's technically possible, but changing Zotero items (i.e., tagging) through an export translator might not be a good solution. Maybe this could be separate from the exporting, e.g., "Tools" -> "Scan BibTeX file for quality reports...". This would add #QualityReport tags to Zotero items with a report in the BIB file and remove the tag if there isn't a report for a given citation key.

(For the tag, you could introduce a new preference with name extensions.zotero.translators.better-bibtex.qualityReportTag and default value #QualityReport.)

In the export dialog, you could add an option "Exclude items with #QualityReport tag". Alternatively, one could check for the tag in a BBT postscript.

Postscript that prevents an export?

A reference.delete() function, which could be used in BBT postscripts, might be useful. You would export zero characters for a given item if that function is called. Currently, you can use the following script to sort out items without a date or creators:

if (Translator.BetterTeX) {
    if (!item.date || (Object.keys(item.creators).length === 0)) {
        reference.referencetype = 'wontcite';
    }
}

With a delete() (or dump()?) function, you could abort an item translation and prevent meaningless exports. You might not want to work on items without date or creators. (In my library, these are items I would never cite.) You could then add #QualityReport tags to items that you might cite but are flawed.

retorquere commented 3 years ago

Working with quality reports

I'm not sure if it's technically possible, but changing Zotero items (i.e., tagging) through an export translator might not be a good solution. Maybe this could be separate from the exporting, e.g., "Tools" -> "Scan BibTeX file for quality reports...". This would add #QualityReport tags to Zotero items with a report in the BIB file and remove the tag if there isn't a report for a given citation key.

I'd have to change the quality reports for that; the QR sits as free-standing text below the entry and isn't structurally "tied" to the reference. It's not set up to be machine-readable.

(For the tag, you could introduce a new preference with name extensions.zotero.translators.better-bibtex.qualityReportTag and default value #QualityReport.)

In the export dialog, you could add an option "Exclude items with #QualityReport tag".

I'm not too keen on this. The workflow seems too hard-coded to me and that usually means I get a lot of follow-up requests.

Alternatively, one could check for the tag in a BBT postscript.

That could in principle just be done on the item itself; I could mark it as having a QR... but the postscript runs before the QR is generated.

Postscript that prevents an export?

That's already possible, you can return { write: false } in the postscript and it won't be written to the output (also return { cache: false } to allow write without storing it in the cache, but that's rarely desirable)

fab6 commented 3 years ago

Hi,

this script could be an alternative... it's slow, it uses some stuff from other tools, expects a bibtex file called references.bib and is probably buggy:

import re
import shutil
import os
#================================================================================
def deleteEntry(path, key):
    # print (100*"-")
    # print (100*"-")
    # print (path)
    # print (key)
    #--------------------------------------------------------------------------------
    f = open(path, 'r')
    content = f.read()
    f.close() 

    #--------------------------------------------------------------------------------
    pattern = re.compile(r"^@\w+\{"+key+r",.*?^\}", re.S | re.M)
    content_modified = re.sub(pattern, "", content)

    #--------------------------------------------------------------------------------
    print (100*"-")
    f = open(path, 'w')
    f.write(content_modified)
    f.close() 
    # exit()

#================================================================================
itemStart = "@"
qualityReport = "% == BibLateX"
comment = "%"
tag_found = True
itemStartL = []
qualityReportL = []
commentL = []
lineNr = 0
# itemB = False
# qualityB = False
#================================================================================
with open('references.bib') as in_file:
    for line in in_file:
        #--------------------------------------------------------------------------------
        if qualityReport in line:
            print (100*"-")
            print ("Found Quaulity")
            print (line)
            extractItem = line.split("% == BibLateX quality report for")
            print (extractItem)
            print (extractItem[1])
            print (extractItem[1].split(":"))
            print (extractItem[1].split(":")[0])
            item4delete = extractItem[1].split(":")[0]
            qualityReportL.append(item4delete)
        lineNr += 1
        # if lineNr > 30:
        #     break
#================================================================================
print (qualityReportL)
pathCopy = "clean_references.bib" 
shutil.copy("references.bib", "references_copy.bib")

#================================================================================
file1 = open('references_copy.bib', 'r')
file2 = open('clean_references.bib', 'w')
for line in file1.readlines():
    if not (line.startswith('%')):
        print(line)
        file2.write(line)
file2.close()
file1.close()

#================================================================================
for i, entry in enumerate(qualityReportL):
    print (100*"=")
    print ("i: ", i)
    print ("max: ", len(qualityReportL))
    print (entry.split(" ")[1])
    short = entry.split(" ")[1]
    deleteEntry("clean_references.bib", short)

#================================================================================
with open('clean_references.bib') as infile, open('clean_references.bib_copy', 'w') as outfile:
    for line in infile:
        if not line.strip(): continue  # skip the empty line
        outfile.write(line)  # non-empty line. Write it to output
shutil.move("clean_references.bib_copy", "clean_references.bib")
fab6 commented 3 years ago

just found this

bibtool '--select{keywords "CLEAN"}' references.bib -o testclean.bib

which is probably better and extracts the items with a manual CLEAN tag given in zotero... just fyi

retorquere commented 3 years ago

I'm adding the QR as a tab to the item pane, but there are a few high-prio items that need attention first.

fab6 commented 3 years ago

all fine, I just thought, that it might help someone... maybe it was not the right place

retorquere commented 3 years ago

No problem, these solutions are not at odds with each other. I just meant to say, from my end, this is what I'm planning. It's not even sure it's possible to do at all since the Zotero item pane is very resistant to plugin changes.