neuropoly / bibeasy

Set of tools to manage academic bibliography
Apache License 2.0
1 stars 0 forks source link

Subcommands #6

Open kousu opened 2 years ago

kousu commented 2 years ago

By my count, bibeasy has four operations it can do. But they're all sort of tangled up and not well separated in

https://github.com/jcohenadad/bibeasy/blob/637635b5e14ccc30bf71a9af491a1e8484393864/bibeasy/scripts/bibeasy_cli.py#L126-L153

The four things are:

  1. Search-and-replace all citations, mapping them from one space to another, where

    • space 1 is one list of CCV pseudo-IDs and space 2 is a newer list of CCV pseudo-IDs (-xd) (deprecated; but still necessary when reusing grant text that has CCV pseudo-IDs; when you're missing the source text for that grant that uses the master gsheet IDs)
    • space 1 is the contents of Julien's master gsheet and space 2 is a list of CCV pseudo-IDs
    • space 1 is a list of CCV pseudo-IDs and space 1 is Julien's master gsheet (--to-gsheet) (deprecated for the same reason)
  2. Display the list of citations, including the mapping into CCV space, and also highlight inconsistencies between the two

    • this happens in the latter two search-and-replace cases, but not in the first one; and also the mapping it computes is done with a different subroutine(!) so it's not guaranteed to be the same; it also can just happen by itself (if you pass -x but not -i)
  3. Display the list of citations, without the CCV names or showing inconsistencies

  4. Export the list of citations database as dokuwiki or .docx

Also, neuropoly/bibeasy#10 adds:

  1. Synchronize the contents of the gsheet and CCV database

I think these should become subcommands, e.g. you could make these five:

bibeasy replace -i input.txt -X CCV.xml 
bibeasy match -x CCV.xml
bibeasy list [-t {article,proceedings,...}]
bibeasy export -o something.{docx,txt} [-t {article,proceedings,...}]
bibeasy sync -X CCV.xml

I also think the middle three are all basically the same and could be combined: they write out the list from the gsheet somewhere. We could merge all three to:

bibeasy list [-x CCV.xml] [-t {article,proceedings,...}] [-o file.{docx,md,txt}]

which would:

(what I'm calling) "match" currently prints the matches and errors it finds in those matches:

GSHEET J5   CCV J148    Investigations on spinal cord fMRI of cats under ketamine
  Mismatched fields: Authors, Journal/Conference

bibeasy list -x can handle printing the matches, but also we could think about dropping that entirely, since an existing feature of (what I'm calling) replace is that you can pass a list of citations instead of a list of files and see what they mapped to:

$ bibeasy -x CCV-98720.xml  -i '[J34,J67,C124]' 
[...]
J34->J108: Effect of respiration on the B0 field in the human spinal cord at 3T
J67->J77: Modeling white matter microstructure
C124->C77: Robust and automatic spinal cord detection on multiple MRI contrasts using machine learning
[J108, J77, C77]

As for seeing the errors, neuropoly/bibeasy-old#13 supersedes that feature. If you do want to see the changes it makes, you can either diff the XML files, or I can make sure neuropoly/bibeasy-old#13's sync --verbose prints the changes it makes as it makes them.

And come to think of it, neuropoly/bibeasy-old#13 is mostly just another kind of bibeasy list, where its export format is XML instead of .docx. If ccv-cvc.ca can handle taking partial CVs, containing only the list of publications, and I think it can, we could split it up into:

bibeasy replace -i input.txt -X CCV.xml
bibeasy list [-t {article,proceedings,book,...}] [-o file.{txt,md,docx,xml}]

I would also drop the feature that you can pass citations in place of input like -i '[J34,J67,C124]', and just document somewhere that, to debug the mapping it makes, you can do

echo '[J34,J67,C124]' | bibeasy replace -X CCV.xml

which makes replace behave like a standard unix filter.

kousu commented 2 years ago

I've used (and like) click for doing subcommands before, but https://mike.depalatis.net/blog/simplifying-argparse.html has way to do something almost click-like but without an extra dependency!

jcohenadad commented 2 years ago

space 1 is one list of CCV pseudo-IDs and space 2 is a newer list of CCV pseudo-IDs (-xd) (deprecated; but still necessary when reusing grant text that has CCV pseudo-IDs; when you're missing the source text for that grant that uses the master gsheet IDs)

we can get rid of that scenario if it simplifies things

space 1 is a list of CCV pseudo-IDs and space 1 is Julien's master gsheet (--to-gsheet) (deprecated for the same reason)

indeed, we can keep it deprecated

Display the list of citations, including the mapping into CCV space, and also highlight inconsistencies between the two

As we discussed, this could be replaced by "Option 1: Search-and-replace all citations"? As you rightfully said, doing a visual diff between the input and output XMLs will highlight inconsistencies

I also think the middle three are all basically the same and could be combined: they write out the list from the gsheet somewhere. We could merge all three to:

I agree

bibeasy replace -i input.txt -X CCV.xml bibeasy match -x CCV.xml

i find it confusing to have "x" and "X". I would keep it all small letter-- it is easier to type (and my shift key is a bit broken 😅 )

I would also drop the feature that you can pass citations in place of input like -i '[J34,J67,C124]', and just document somewhere that, to debug the mapping it makes, you can do

if it makes easier from a codebase then I agree

kousu commented 2 years ago

bibeasy replace -i input.txt -X CCV.xml bibeasy match -x CCV.xml

i find it confusing to have "x" and "X". I would keep it all small letter-- it is easier to type (and my shift key is a bit broken sweat_smile )

That was a typo! I'll use -x.

Okay this all sounds good, I'll make it happen.