zotero / translators

Zotero Translators
http://www.zotero.org/support/dev/translators
1.19k stars 743 forks source link

American Meteorological Society: No longer uses Atypon #2997

Open dstillman opened 1 year ago

dstillman commented 1 year ago

https://forums.zotero.org/discussion/98261/translator-update-needed-for-american-meteorological-society-ams-journals

https://github.com/zotero/translators/issues/945#issuecomment-1184463243

Can probably just use EM with some tweaks (e.g., stripping "Abstract" from the abstract).

brendan-oconnell commented 1 year ago

AMS has switched publishing platforms from Atypon to KGL PubFactory: https://www.pubfactory.com/our-clients/. However, other non-AMS journals on this platform don't have this specific issue, because they don't put the word "Abstract" into their Abstract, e.g. https://thejns.org/view/journals/j-neurosurg/138/3/article-p587.xml. I'll create a new translator for all PubFactory journals, since one doesn't seem to exist yet, and fix the small Abstract issue that's specific to AMS journals at the same time. EM already works really well, as you mention, so should be pretty easy.

dstillman commented 1 year ago

Do we actually need a PubFactory translator, or should we just have an AMS translator that fixes Abstract? (Or should we just add some code to EM or the translation architecture that strips "Abstract" at the beginning of an abstract?) (Or should we just email them?) We seem to get perfectly good data from that JNS article, and that's really the ideal — having sites offer great embedded metadata, including PDF links, without our having to maintain translators. Ideally we'd be regularly deleting translators, not adding them.

brendan-oconnell commented 1 year ago

I think the reason they're including the word "Abstract" as a heading in their Abstract is so that that heading renders the same in the Full Text for OA articles, e.g. https://journals.ametsoc.org/view/journals/amsm/59/1/amsmonographs-d-18-0005.1.xml?tab_body=fulltext-display, where the <section class="abstract"> seems to be identical to what's displayed in the Abstract/Excerpt tab in <section class="abstract">. The other example I shared from JNS isn't OA, so there's no Full Text view, and thus no need to include the word "Abstract" in the Abstract. So... I think the word Abstract needs to be there? Which would argue against emailing them to ask them to remove it.

As to why create a PubFactory translator vs. a translator just for AMS that fixes Abstract, and leaving EM to handle the other PubFactory journals - the only reason I could see to create a PubFactory translator is that detectWeb in EM doesn't handle adding multiple items, e.g. https://journals.humankinetics.com/view/journals/pes/pes-overview.xml. What do you think?

dstillman commented 1 year ago

It definitely doesn't need to be there — these are plain-text <meta> fields, so even if the value is coming from the same HTML data source that's populating the visible tabs, they should strip it when populating the property="og:description" and name="description" <meta> fields.

the only reason I could see to create a PubFactory translator is that detectWeb in EM doesn't handle adding multiple items

We could add a translator that supported just the multiple pages and leave the regular pages for EM.

brendan-oconnell commented 1 year ago

Great, I sent an email to PubFactory to ask if they could change this in their platform, and I'll work separately on a translator that's just for multiple on all PubFactory journals

brendan-oconnell commented 1 year ago

It's been a week and I didn't hear back from PubFactory, so I went ahead and let this translator cover both single articles and multiple, and implemented the quick fix for removing the word "Abstract". EM does work well on single articles, so including them in detectWeb() for this translator wouldn't be strictly necessary if PubFactory fixed this on their end.

However, I decided to include single articles in this translator so that multiple doesn't get called for articles with "Related Content", e.g. https://avmajournals.avma.org/view/journals/javma/261/4/javma.22.11.0518.xml. I haven't found a way to write a single querySelector() that only matches the main article on single pages or not having a querySelectorAll() that doesn't accidentally include Related Content and thus trigger multiple, since it's handled differently by different journals.