Request: Import of Bibliographic Data in Folders

jangtze commented 7 months ago

Hello,

thank you for your work and this plugin! I had not switched to Zotero without it.

My situation

was the following: a directory with all my literature (pdfs, djvu ...) and bibliographic info (bibtex) along with it in respective subfolders. I wanted to keep the general structure and found your plugin, though couldn't figure out how to get the bib-info too.

/literature_folder
- /fiction
- /poetry ...
- /drama ...
- /science
- /biology
  - /author_book1 ~~[A]~~
  - /book1_name.pdf
  - /book1_bib.bibtex
- /chemistry
  - /author_some_book ~~[B - maybe problematic]~~
  - /book.djvu
  - /book_v2.pdf
  - /some_info.bibtex
  - /author_book2 ~~[C - certainly not easy]~~
  - /book1.pdf
  - /book1.bibtex
  - /book2.pdf
  - /book2_several_editions.bibtex
- /physics ...

Workaround Solution

For now I used a workaround following this advice or more like this to just merge all separate bib-files into one and import that. With zotero-folder-import I got the linked files and folder structure. After that I manually sorted the data to match the created items.

versions

macOS 12.6
Zotero 6.0.37
zotero-folder-import 0.0.7 (newest available)

Proposed Improvement (s)

Allow the import of bibliographic data as items not linked attachments
[like the default ris/bib/... - import looped through folders and placed in right subcollections]
~~maybe help matching:~~
- ~~with an option to automatically assign pdf/djvu/... - files in a subfolder to the generated item~~
- ~~as this could be very complicated (see [C]), a first stepcould be: only match if there is just one item per subfolder (as in [A])~~

I hope the issue is comprehensible and you consider it. If anything is unclear feel free to ask.

I understand this might be a lot of work and initially I wanted to implement it myself, but I am inexperienced with javascript and the Zotero project / plugin writing. If I can I would gladly help.

edit: removed the matching part and added clarification to main proposition

retorquere commented 7 months ago

This sounds like the kind of thing that would get out of hand fast. I'd be happy to take a zip file of that and write a script to combine the bib files in a way that they would recreate the folder structure and attach the files on import.

jangtze commented 7 months ago

Thank you for the reply! Do you mean the matching would get out of hand? Yes, I feared so too, therefore I just wanted to add it as a remark.

However I thought the creation of items would basically happen as with the default import, just looped over the folder structure? Admittedly though I don't know how accessible that is through the plugin API.

Regarding the zip file, as I have done the workaround now, I migrated the whole library and got rid of the old one in the process (ZotFile - renaming and moving). There surely would be a backup somewhere, however I guess the whole lib (~5gb) is too big. Extracting all the bib-files in their folder structure (eg via some shell script) doesn't seem very helpful to me. Just some fake test case should do as well. Maybe I misunderstood.

retorquere commented 7 months ago

I was offering to prepare your files for import to get the layout you want -- if you have already imported them, you don't need it.

It's not difficult to import the bib files from a plugin, but to recognize which bib file caused what item(s) is hard, it's going to have a lot of edge cases, and I'd be chasing bugs forever -- which is going to get out of hand.

retorquere commented 7 months ago

If you're set with your current library I'd like to close this issue. Parsing and attaching is not something I'm going to add.

jangtze commented 7 months ago

I can see the matching is too difficult. But I think it'd help a lot (and provided a lot of use for others with similar situations) if the import of bibliographic data would create the items and put them in the folder structure as happens with other imported files. Does that also need parsing? If this easy part to implement does not pose any issue I wouldn't close. However, it is your call in the end.

I will cancel out the other part, as that is pretty straight forward to do manually. If the item-folder-structure is preserved and the files/ items are already in the same place it is a lot easier, than what I did.

retorquere commented 7 months ago

But are you willing to test this if you've already completed your import? I don't build things that don't have an active user willing to test.

jangtze commented 7 months ago

Yes of course.

I'd be willing to implement parts of it. However, as I've said, since I needed help with most of the structure it be more work than implementing it directly. I guess the real "code" is just a few for loops and calls of functions.

retorquere commented 7 months ago

The UI is going to be the complicated bit. Zotero doesn't have a list of extensions it knows how to import, if you offer it a file, it will try to import it and does detection based on its content. We'd have to either offer all files for detection regardless of content/extension (which is going to be exceedingly slow, and I don't know what Zotero does if you offer it, say, a djvu file), or allow the user to mark some extensions for import rather than as attachments.

I really don't like doing UI work in general, the UI framework for Zotero 6 is largely undocumented (old tech Firefox abandoned a long time ago), and should be regarded as incompatible with Zotero 7. If you look into the UI I'd be happy to wire the rest up.

jangtze commented 7 months ago

I would argue that importing only bibliographic information first is fine. So I would add just one more radio button on file select. This would be quick and dirty (making possible matching more difficult at a later point). But for now it could be fine.

in https://github.com/retorquere/zotero-folder-import/blob/e0789a9aa98307bae3f54d8d9821c36f65454f11/content/bulkimport.xul#L37-L40 add

<radiogroup id="folder-import-link-or-import">
[...]
<radio id="folder-import-bib" label="Import as Item" value="store"/>
</radiogroup>

in https://github.com/retorquere/zotero-folder-import/blob/e0789a9aa98307bae3f54d8d9821c36f65454f11/content/bulkimport.ts#L23-L45 maybe a filter for all "import-able" types is necessary (ris, bibtex, rdf, ...). I have tried to find a list that surely work, but no result yet.

retorquere commented 7 months ago

Zotero doesn't have such a list, because Zotero doesn't have a concept of "filetypes as derived from the name of the file after the period". If you offer it a file called somename.xlsx with content

@article{Brosseau2010,
  title = {Carrier Recombination Dynamics in {{In\textsubscript{x}Ga}}{{{\textsubscript{1-x}}}}{{N}}/{{GaN}} Multiple Quantum Wells},
  author = {Brosseau, Colin-N. and Perrin, Mathieu and Silva, Carlos and Leonelli, Richard},
  year = {2010},
  volume = {82},
  pages = {085305},
  doi = {10.1103/PhysRevB.82.085305},
  journal = {Phys. Rev. B},
  number = {8}
}

it will detect that it is bibtex and will import it as such. I don't think that hard-coding a list of file extensions would be the right approach either.

That aside, adding a radio button seems like the wrong approach -- regardless of wanting to import bib or ris or whatever as items, you then still have a choice whether you want to link or import the other files as attachments. Adding a checkbox instead would fix that, but doesn't address the hardcoding issue, or the problem that this UI stuff may not work under Zotero 7.

jangtze commented 7 months ago

I meant to separate the import of bibliographic data and allow only one or the other. The filter wouldn't be ideal, true. In general it wouldn't be a particularly 'nice' solution, but I thought an easy one to do. A checkbox "convert bib to item" would definitively be more neat.

However, it just came to my mind, that import from clipboard is also possible. So there must be some function taking plain text. Is it possible to try to open the files manually and then forward plain text into that? ... on second thought that doesn't solve the problem with actual djvu files or any binary for that matter. In the end I can't think of a non-hardcoded way. Is there any chance asking in the forum helps?

Concerning the UI problem with Zotero 7, I have no clue what is going to happen there. Wouldn't that affect legacy plugins in any case?

retorquere commented 7 months ago

I meant to separate the import of bibliographic data and allow only one or the other. The filter wouldn't be ideal, true. In general it wouldn't be a particularly 'nice' solution, but I thought an easy one to do. A checkbox "convert bib to item" would definitively be more neat.

But that would exclude items available in RIS, MODS, etc.

However, it just came to my mind, that import from clipboard is also possible. So there must be some function taking plain text. Is it possible to try to open the files manually and then forward plain text into that?

Not necessary. You can just point Zotero to a file. I use that extensively in the testsuite of another plugin. But a decision would have to be made which files you point Zotero to here.

... on second thought that doesn't solve the problem with actual djvu files or any binary for that matter.

Those would just be attached, if we can determine that they are binaries,

In the end I can't think of a non-hardcoded way.

Leave the choice to the user, for which a UI would be needed.

Is there any chance asking in the forum helps?

You could try, but I've had limited success.

Concerning the UI problem with Zotero 7, I have no clue what is going to happen there.

https://www.zotero.org/support/dev/zotero_7_for_developers

Wouldn't that affect legacy plugins in any case?

Yes, but it seems unwise to invest time into a change that isn't going to be compatible with 7. I'll see whether I can get the basics compatible, I've done that before. It'd be a great help if you could then look into the UI stuff. I really don't like UI work.

jangtze commented 7 months ago

But that would exclude items available in RIS, MODS, etc.

I meant any kind of bibliographic format not bibtex. Also trying to import a script file I got Zotero to complain and forward me to this list. That actually happens in File -> Import with eg. pdf too. So If you can call that and just output any errors to a log I guess that is fine.

So if possible, I suppose the checkbox would say

[ ] "try to create items from bibliographic formats"

then try that import and otherwise attach. I have to dive deeper into how that happens still.

Leave the choice to the user, for which a UI would be needed.

I don't get the UI problem, the current one does ask for file-types already. If we work with that and just add the checkbox, it would work no? I mean of course some edge-cases like xlsx might fail, but lets just disregard those. Else I am genuinely puzzled where you see the need for another UI-element. From the limited research done, however, I can understand the hesitation to work with it.

You could try, but I've had limited success.

Haha, I guess I understand.

https://www.zotero.org/support/dev/zotero_7_for_developers

will look into it.

Yes, but it seems unwise to invest time into a change that isn't going to be compatible with 7.

Absolutely agree! That is why I am trying to figure out a quick solution with, say that one checkbox; in order to keep it minimal.

retorquere commented 7 months ago

I meant any kind of bibliographic format not bibtex. Also trying to import a script file I got Zotero to complain and forward me to this list. That actually happens in File -> Import with eg. pdf too. So If you can call that and just output any errors to a log I guess that is fine.

I already know how to do this as I do it in my testing framework for better bibtex. I just worry that offering mostly pdfs etc to that api call would be terribly slow while it tries every translator on every (relatively big) file.

then try that import and otherwise attach. I have to dive deeper into how that happens still.

Which has this potential performance problem.

I don't get the UI problem, the current one does ask for file-types already. If we work with that and just add the checkbox, it would work no?

If you plan to try to parse all attachments, which I don't think is a good idea.

I mean of course some edge-cases like xlsx might fail, but lets just disregard those

Which means you propose hardcoding extensions, which I don't think is a good idea.

Absolutely agree! That is why I am trying to figure out a quick solution with, say that one checkbox; in order to keep it minimal.

But the existing UI likely doesn't work on 7.

jangtze commented 7 months ago

I just worry that offering mostly pdfs etc to that api call would be terribly slow [...] [...] Which has this potential performance problem.

I see. Ok then last try at 'quick and easy':

How much work is simply copying the current file-select interface down, such that the user can select the attachments and after checking the radiobox the bibliographic files. Then parsing only happens for the second set, whilst the first is attached.

Which means you propose hardcoding extensions, which I don't think is a good idea.

Well but Zotero doesn't take in those other files for me anyways? Is that different for you?

But the existing UI likely doesn't work on 7.

I will look into 7 and see if it could be made compatible.

retorquere commented 7 months ago

How much work is simply copying the current file-select interface down, such that the user can select the attachments and after checking the radiobox the bibliographic files. Then parsing only happens for the second set, whilst the first is attached.

That's possible. Preferable would be some kind of tri-state select your each extension.

Well but Zotero doesn't take in those other files for me anyways? Is that different for you?

No, because it determines it cannot find a translator able to import it as items, after trying all 24 of them, on each file. At scale, this would likely have bad performance.

retorquere commented 7 months ago

I think I'll be able to convert it to 7 compatible code in the weekend. Maybe the UI code will work mostly as is.

retorquere commented 7 months ago

I apologize for being so stubborn. I hate UI work even though recognize it is necessary to do it (mostly) right, and I worry about ending up with a confusing UI for which I'm on the hook to fix later.

jangtze commented 7 months ago

Don't worry, I can understand the hesitation. The possibilities don't seem very inviting and the prospect of having to redo everything is neither. Also the application seems to be quite niche as it appears the bulk of users doesn't care about file-management.

That's why I strongly advocate for the simplest possible solution. If only I had more knowledge about it I could do that myself, however, I am only starting with this here and now.

retorquere / zotero-folder-import