jlegewie / zotfile

Zotero plugin to manage your attachments: automatically rename, move, and attach PDFs (or other files) to Zotero items, sync PDFs from your Zotero library to your (mobile) PDF reader (e.g. an iPad, Android tablet, etc.), and extract PDF annotations.
4.05k stars 282 forks source link

Regex capturing not working as one would expect (Failed to compose renaming scheme) #668

Open gr4nt3d opened 9 months ago

gr4nt3d commented 9 months ago
I wanted to rename files according to the following scheme: "original data" "more data" > rename
first middle surname of authors title > surname_f(m)_title
Author McAuthentic How to waste time with regex > mcauthentic_a_how_to_waste... (depends on cutoff)
Adam Adams; Bob Bobsen Our-Very-Important Book > adams_a_bobsen_b_our_very_important_book
... ... ... the middle names are optional and I already gave up on

I tried to get information from the website, but the description on how users can define wildcards did not cover all necessary information. Googling and a code search was unsuccessful. ChatGPT helped a little but did not get the problem. My regex should do the job according to regex101.com -- compare below -- but I am too inexperienced to rule out a simple error on my behalf. Anyways, Zotero / ZotFile does not accept it and just gives me the full authorLastG that I used, or doesn't capture all necessary info.

Zotero-Version: 6.0.33 (homebrew) ZotFile-Version: 5.1.2 OS: macOS

Ideal fix: Please provide other means to make such formats easily achievable.


My test-case for the regex capturing: I am aware that the test-case does not suffice but should still represent a relevant MWE. If it works a sequence of regex's, append's and finally replace and toLower would be enough (hence it must work in sequence). In the end the information from `authorLastG` (eg. `_surname,_firstname_surname2,_firstname2_m.`) should be extracted and the format would look something like this: `{%1_}{%t}` My regex: ```re ([a-zA-Z]+)\,_([a-zA-Z])?(?:[a-zA-Z_\.]+?(?=[a-zA-Z]+\,|$))?(?:([a-zA-Z]+)\,_([a-zA-Z])?(?:[a-zA-Z_\.]+?(?=[a-zA-Z]+\,|$))?)? ``` ChatGPTs try: ```re ([A-Za-z]+)\s*,\s*([A-Za-z])?(?:[A-Za-z_\.]+?(?=[A-Za-z]+\s*,|$))?(?:\s*,\s*([A-Za-z]+)\s*([A-Za-z])?(?:[A-Za-z_\.]+?(?=[A-Za-z]+\s*,|$))?)? ``` Test: ```json { "2": { "field": "authorLastG", "operations": [ { "function": "exec", "regex": "([a-zA-Z]+)\\,_([a-zA-Z])?(?:[a-zA-Z_\\.]+?(?=[a-zA-Z]+\\,|$))?(?:([a-zA-Z]+)\\,_([a-zA-Z])?(?:[a-zA-Z_\\.]+?(?=[a-zA-Z]+\\,|$))?)?", "group": 2 }, { "function": "toLowerCase" } ] } } ```
Some authorLastG test-cases for regex101: ```txt brunton,_steven_l._kutz,_j._nathan_ bishop,_christopher_m._ sutton,_richard_s._barto,_andrew_g._ cai,_qingpeng_et_al_ konda,_vijay_tsitsiklis,_john_ konda,_vijay_tsitsiklis,_john li,_yuxi_ lillicrap,_timothy_p._et_al_ mcauthentic,__ tsitsiklis,_john_n_van_roy,_benjamin_ sutton,_richard_s._et_al sutton,_richard_s._barto,_andrew_g. mitchell,_thomas szepesvari,_c. ```