Update metadata for existing items

dstillman commented 6 years ago

A common request, as in https://forums.zotero.org/discussion/72336/feature-request-update-or-force-recheck-of-metadata

This would probably be a context-menu option, and we'd automatically try to find an appropriate canonical item using various APIs and possibly also a modified version of recognizer-server. (The person in the linked thread is coming from Mendeley, which has both a context-menu option to "Update Details" and an icon next to each identifier, requiring the user to somehow guess which identifier would produce the best results. The identifiers are also often completely wrong, resulting in the item turning into a totally different item.)

Not sure whether we want to show the merge pane or just update the metadata automatically. Certainly we can fill in empty fields, but the user might have made other corrections (including sentence-casing the title) and it would be annoying to overwrite those.

ethanwillis commented 6 years ago

Hey @dstillman See this discussion on twitter. https://twitter.com/chrisharms/status/1030493300632051713

I'm looking into creating a plugin that will update metadata from cross-ref. Seems pretty straightforward in the naive case.

I'd like if we could discuss and enumerate more of the edge cases you're aware of. As well I believe I'm right that a plugin could be easily integrated into the main Zotero codebase, can you confirm this or highlight any potential integration issues?

dstillman commented 6 years ago

I don't think there's much point to doing this as a plugin, since it's something we're planning to implement in the very near future, but if you'd like to work on getting this functionality in for real, I'd be happy to work with you on that.

As I said on Twitter, it's trivial to do this poorly — just adding a menu option that overwrote the existing item would take a few minutes. (You don't need to parse Crossref data or anything like that, to be clear.) But there's very little advantage to that over using Add Item by Identifier + Merge Items (or URL click and resave + Merge Items if no identifier), which is why we haven't bothered with it previously.

The hard part here is just figuring out what to do with the new data. We can fill in empty fields, but if there are fields in conflict, it seems like the options are basically 1) show the conflict resolution window or 2) update the metadata automatically but create an undo record to restore the previous data. A CR window might be too much friction for something that would usually be fine, so I guess I'm leaning towards the latter.

We don't currently have any undo support, but I think we could rig something up quickly for this particular use case that stored the JSON for a set of items and added a temporary Zotero.Notifier handler to clear the undo stack the next time there was an item/collection/search action and then unregister itself. (And if there was a basic framework for this, we could gradually expand that to other operations, which would be great.)

If you'd like to work on this, I'm happy to give pointers on the technical details as necessary.

dstillman commented 6 years ago

We're working on this internally. (Thinking about it more, there are some pretty complicated parts here, so it's a bit out of scope for a PR.)

01baftb commented 6 years ago

I don't think there's much point to doing this as a plugin, since it's something we're planning to implement in the very near future, but if you'd like to work on getting this functionality in for real, I'd be happy to work with you on that.

As I said on Twitter, it's trivial to do this poorly — just adding a menu option that overwrote the existing item would take a few minutes. (You don't need to parse Crossref data or anything like that, to be clear.) But there's very little advantage to that over using Add Item by Identifier + Merge Items (or URL click and resave + Merge Items if no identifier), which is why we haven't bothered with it previously.

The hard part here is just figuring out what to do with the new data. We can fill in empty fields, but if there are fields in conflict, it seems like the options are basically 1) show the conflict resolution window or 2) update the metadata automatically but create an undo record to restore the previous data. A CR window might be too much friction for something that would usually be fine, so I guess I'm leaning towards the latter.

We don't currently have any undo support, but I think we could rig something up quickly for this particular use case that stored the JSON for a set of items and added a temporary Zotero.Notifier handler to clear the undo stack the next time there was an item/collection/search action and then unregister itself. (And if there was a basic framework for this, we could gradually expand that to other operations, which would be great.)

If you'd like to work on this, I'm happy to give pointers on the technical details as necessary.

Firstly, I have to say I am really looking forward to this feature. The automatic metadata retrieval from PDF may not always contain correct information (at-least my experience from Mendeley). I frequently update metadata using DOI of my items, and not having an easy way to update the metadata with a click of a button make its difficult in Zotero.

I agree, that a CR window would be too much friction and a undo record would be a better approach.

As a previous user of Mendeley, I want to suggest the following can be improved on:

I think new metadata should fully replace all existing metadata, including deleting existing metadata if the field in the new metadata is empty. Why? I had many situations when automatic metadata retrieval from PDF was incorrect. When I do a force recheck with DOI, the metadata gets updated with correct metadata, BUT if a field of the correct metadata is empty and if the field of incorrect metadata has data but is incorrect, then the incorrect metadata remains on the item.

ilyapopov commented 5 years ago

Can creating a duplicate item with new data be good enough? The user then has a choice of deleting an old item (if they are happy with the new data), merging it using existing duplicate resolution facility, or deleting the new one (if new data is garbage). The question is what do with collections?

ayala-io commented 4 years ago

@dstillman is this feature still on your radar?

Came here from the Zotero forums thread regarding this topic. I'd also like to chime in here to say that this is a highly anticipated feature especially for those who deal with literature dumps.

The suggested DOI import and merge is way too cumbersome. PDF imports and the corresponding DOI import often do not show up as duplicates (in my experience), so you'll have to hunt for the PDF entry and the DOI entry, select them and then merge. Not fun for literature dumps.

I can't speak for everyone else but when I want to fetch info using the DOI, I pretty much don't care about whatever metadata already exists (of course I do this only for items that I know that have garbage metadata). I want the DOI fetch to override everything. Perhaps a warning (which can then be disabled) should explain this before using DOI fetch.

Paperpile and Mendeley (I think) have an additional column or indicator next to each item that shows whether or not the metadata for the corresponding item is complete or not. At which point, I can filter/select items that are flagged as incomplete and run a mass DOI fetch on them (assuming that I manually pasted the DOI for those entries).

Another possibility is in addition to a complete full-entry DOI fetch, you can have a small button (or right-click option) for each field that a user can click on to fetch and override data for that particular field. Particularly useful if you have an already existing entry that you typed out manually (common case for papers/proceedings recently accepted or in press), and you want to properly update a few fields using the metadata fetched from the DOI.

mannychiu commented 4 years ago

This can be really helpful when you write your paper and find your database not up-to-date! I hope they are still working on it.

dstillman commented 4 years ago

Yes, we're working on it. https://github.com/zotero/zotero/pull/1582

aarontaycheehsien commented 4 years ago

Add one more vote to this. I'm trying many of the newer tools that produce citation maps like Citation Gecko, Connected Papers. Or extracting references from papers using services like Scholarcy API. For whatever reason when I export the generated bibliographies, they tend to miss out dois or have poor metadata. This would be a great thing to add. Currently I resort to using Mendeley to clean this up.

alspitz commented 3 years ago

Very excited for this feature. A lot of items I have in my library have metadata that was found before the automatic metadata detection was as good as it is now, so it would be good to be able to easily update those items.

raphaelchinchilla commented 2 years ago

Yes, we're working on it. #1582

It would be nice if there were a feature to automatically accept all the new metadata if the original Library Catalog was Arxiv.

AbeJellinek commented 2 years ago

@raphaelchinchilla: You still have to click Apply as with any update, but that's all. Here's an example of a random item I pulled from arXiv that was later published in a journal:

Clicking Apply in that case will automatically replace all the metadata (besides Language) with that of the published article.

If that's not what you mean, can you elaborate a bit on a specific case it should handle? Is there a situation in which a field is disabled by default and shouldn't be?

raphaelchinchilla commented 2 years ago

@AbeJellinek This looks great. Is this available in the latest stable release? If yes then I am not sure how to use it yet (and I found this issue trying to discover how to do it)

AbeJellinek commented 2 years ago

Not quite yet, but we're working on it! If you want to try it out, you can pull that PR and compile Zotero on your machine; there aren't any stable or beta builds with this feature available yet, and it isn't guaranteed not to blow things up / break items.

AlexanderZeilmann commented 2 years ago

@AbeJellinek will the old arXiv information of an updated article be preserved in the Archive & Loc. in Archive fields? It does not look like it in the screenshot above, but that would be really nice to have.

Otherwise this looks like really great work so far. Thank you!

Grenemal commented 2 years ago

@AbeJellinek the updata feature lookes great, and i want to try it becasue i have thousands ref waiting for correcting. these references are export from scopus and large information was incorrrect. however, i don't know how to compile Zotero for my own, is there a construction for it ?

marco-coraggio commented 2 years ago

Looking forward to this feature

eggrandio commented 2 years ago

Any updates on when this feature will be available on a beta version?

anovitzkij commented 1 year ago

Looking forward to this feature

GiggleLiu commented 1 year ago

Looking forward to this feature too.

lubaroli commented 1 year ago

+1

alexispaz commented 1 year ago

+1

kuangjidi commented 1 year ago

+1

HumpyBlumpy commented 1 year ago

is this in Zotero 7 beta by any chance?

thoguib commented 9 months ago

It doesn't look like it from the .zip Beta bversion for Windows (https://www.zotero.org/support/beta_builds). I'm not sure if this was foreseen, but it would be great to be able to update metadata for multiple items at a time, selecting them first (just like the current "Find available PDFs" command).

zotero / zotero

Update metadata for existing items #1515