Zotero plugin to manage your attachments: automatically rename, move, and attach PDFs (or other files) to Zotero items, sync PDFs from your Zotero library to your (mobile) PDF reader (e.g. an iPad, Android tablet, etc.), and extract PDF annotations.
4.05k
stars
282
forks
source link
Regex capturing not working as one would expect (Failed to compose renaming scheme) #668
I wanted to rename files according to the following scheme:
"original data"
"more data"
>
rename
first middle surname of authors
title
>
surname_f(m)_title
Author McAuthentic
How to waste time with regex
>
mcauthentic_a_how_to_waste... (depends on cutoff)
Adam Adams; Bob Bobsen
Our-Very-Important Book
>
adams_a_bobsen_b_our_very_important_book
...
...
...
the middle names are optional and I already gave up on
I tried to get information from the website, but the description on how users can define wildcards did not cover all necessary information. Googling and a code search was unsuccessful. ChatGPT helped a little but did not get the problem. My regex should do the job according to regex101.com -- compare below -- but I am too inexperienced to rule out a simple error on my behalf. Anyways, Zotero / ZotFile does not accept it and just gives me the full authorLastG that I used, or doesn't capture all necessary info.
Ideal fix: Please provide other means to make such formats easily achievable.
My test-case for the regex capturing:
I am aware that the test-case does not suffice but should still represent a relevant MWE. If it works a sequence of regex's, append's and finally replace and toLower would be enough (hence it must work in sequence). In the end the information from `authorLastG` (eg. `_surname,_firstname_surname2,_firstname2_m.`) should be extracted and the format would look something like this: `{%1_}{%t}`
My regex:
```re
([a-zA-Z]+)\,_([a-zA-Z])?(?:[a-zA-Z_\.]+?(?=[a-zA-Z]+\,|$))?(?:([a-zA-Z]+)\,_([a-zA-Z])?(?:[a-zA-Z_\.]+?(?=[a-zA-Z]+\,|$))?)?
```
ChatGPTs try:
```re
([A-Za-z]+)\s*,\s*([A-Za-z])?(?:[A-Za-z_\.]+?(?=[A-Za-z]+\s*,|$))?(?:\s*,\s*([A-Za-z]+)\s*([A-Za-z])?(?:[A-Za-z_\.]+?(?=[A-Za-z]+\s*,|$))?)?
```
Test:
```json
{
"2": {
"field": "authorLastG",
"operations": [
{
"function": "exec",
"regex": "([a-zA-Z]+)\\,_([a-zA-Z])?(?:[a-zA-Z_\\.]+?(?=[a-zA-Z]+\\,|$))?(?:([a-zA-Z]+)\\,_([a-zA-Z])?(?:[a-zA-Z_\\.]+?(?=[a-zA-Z]+\\,|$))?)?",
"group": 2
},
{
"function": "toLowerCase"
}
]
}
}
```
Some authorLastG test-cases for regex101:
```txt
brunton,_steven_l._kutz,_j._nathan_
bishop,_christopher_m._
sutton,_richard_s._barto,_andrew_g._
cai,_qingpeng_et_al_
konda,_vijay_tsitsiklis,_john_
konda,_vijay_tsitsiklis,_john
li,_yuxi_
lillicrap,_timothy_p._et_al_
mcauthentic,__
tsitsiklis,_john_n_van_roy,_benjamin_
sutton,_richard_s._et_al
sutton,_richard_s._barto,_andrew_g.
mitchell,_thomas
szepesvari,_c.
```
I tried to get information from the website, but the description on how users can define wildcards did not cover all necessary information. Googling and a code search was unsuccessful. ChatGPT helped a little but did not get the problem. My regex should do the job according to regex101.com -- compare below -- but I am too inexperienced to rule out a simple error on my behalf. Anyways, Zotero / ZotFile does not accept it and just gives me the full
authorLastG
that I used, or doesn't capture all necessary info.Zotero-Version: 6.0.33 (homebrew) ZotFile-Version: 5.1.2 OS: macOS
Ideal fix: Please provide other means to make such formats easily achievable.
My test-case for the regex capturing:
I am aware that the test-case does not suffice but should still represent a relevant MWE. If it works a sequence of regex's, append's and finally replace and toLower would be enough (hence it must work in sequence). In the end the information from `authorLastG` (eg. `_surname,_firstname_surname2,_firstname2_m.`) should be extracted and the format would look something like this: `{%1_}{%t}` My regex: ```re ([a-zA-Z]+)\,_([a-zA-Z])?(?:[a-zA-Z_\.]+?(?=[a-zA-Z]+\,|$))?(?:([a-zA-Z]+)\,_([a-zA-Z])?(?:[a-zA-Z_\.]+?(?=[a-zA-Z]+\,|$))?)? ``` ChatGPTs try: ```re ([A-Za-z]+)\s*,\s*([A-Za-z])?(?:[A-Za-z_\.]+?(?=[A-Za-z]+\s*,|$))?(?:\s*,\s*([A-Za-z]+)\s*([A-Za-z])?(?:[A-Za-z_\.]+?(?=[A-Za-z]+\s*,|$))?)? ``` Test: ```json { "2": { "field": "authorLastG", "operations": [ { "function": "exec", "regex": "([a-zA-Z]+)\\,_([a-zA-Z])?(?:[a-zA-Z_\\.]+?(?=[a-zA-Z]+\\,|$))?(?:([a-zA-Z]+)\\,_([a-zA-Z])?(?:[a-zA-Z_\\.]+?(?=[a-zA-Z]+\\,|$))?)?", "group": 2 }, { "function": "toLowerCase" } ] } } ```Some
```txt brunton,_steven_l._kutz,_j._nathan_ bishop,_christopher_m._ sutton,_richard_s._barto,_andrew_g._ cai,_qingpeng_et_al_ konda,_vijay_tsitsiklis,_john_ konda,_vijay_tsitsiklis,_john li,_yuxi_ lillicrap,_timothy_p._et_al_ mcauthentic,__ tsitsiklis,_john_n_van_roy,_benjamin_ sutton,_richard_s._et_al sutton,_richard_s._barto,_andrew_g. mitchell,_thomas szepesvari,_c. ```authorLastG
test-cases for regex101: