Open adomasven opened 5 months ago
I'm still not getting EM on this page. The main translator is fixed, so it's less of an issue here, but this is likely preventing EM from working elsewhere.
(4)(+0000028): Translate: Binding sandbox to https://journals.ametsoc.org/view/journals/phoc/53/1/JPO-D-22-0001.1.xml
debug.js:87 (4)(+0000003): Translate: Parsing code for PubFactory Journals (8d1fb775-df6d-4069-8830-1dfe8e8387dd, 2024-06-04 18:20:00)
debug.js:87 (4)(+0000014): Translate: Parsing code for unAPI (e7e01cac-1e37-4da6-b078-a0e8343b0e98, 2019-06-10 23:11:21)
debug.js:87 (4)(+0000002): Translate: Parsing code for COinS (05d07af9-105a-4572-99f6-a8e231c0daef, 2021-06-01 17:38:46)
debug.js:87 (4)(+0000004): Translate: Parsing code for Embedded Metadata (951c027d-74ac-47d4-a107-9c3069ab7b48, 2024-03-27 20:15:00)
debug.js:87 (3)(+0000000): Translate: Prefix 'og' => 'http://ogp.me/ns#'
debug.js:87 (3)(+0000000): Translate: Prefix 'fb' => 'http://ogp.me/ns/fb#'
debug.js:87 (3)(+0000000): Translate: Prefix 'article' => 'http://ogp.me/ns/article#'
debug.js:87 (3)(+0000000): Translate: Embedded Metadata: found 0 meta tags.
debug.js:87 (4)(+0000013): Translate: Parsing code for DOI (c159dcfe-8a53-4301-a499-30f6549c340d, 2024-05-17 20:25:00)
debug.js:87 (3)(+0000000): Translate: All translator detect calls and RPC calls complete:
debug.js:87 (3)(+0000001): PubFactory Journals: 200
debug.js:87 (3)(+0000000): DOI: 400
I cannot reproduce this in a new profile with the current release build.
Discovered in: https://github.com/zotero/translators/issues/3311#issuecomment-2148755216 Problem page: https://journals.ametsoc.org/view/journals/phoc/53/1/JPO-D-22-0001.1.xml
EM translator is not detected, because no meta tags are found. I've discovered, that
This is because the 21st tag is
Apparently, img in noscript before body is invalid, and will cause the head element to be parsed as immediately terminated and body element to begin. So it seems like this page is intentionally breaking crawlers and such from accessing the meta tags in the head element, or something like that.
Anyway, as a proposed solution, I think we should strip all
<noscript>
tags from<head>
in MV3 before parsing.