l0o0 / translators_CN

Zotero translator中文网页抓取翻译器🎉This is Zotero translators for Chinese Sites(beta), not the official Zotero repo
GNU Affero General Public License v3.0
4.11k stars 525 forks source link

Fix: affiliation numbers in author list when have multiple affiliations #148

Closed wohenbushuang closed 1 year ago

wohenbushuang commented 1 year ago

eg. https://d.wanfangdata.com.cn/periodical/ysxb98202209004

Before:

             "creators": [
               {
                 "firstName": "计国"
                 "lastName": "刘"
                 "creatorType": "author"
               }
               {
                 "firstName": ""
                 "lastName": "1"
                 "creatorType": "author"
               }
               {
                 "firstName": "凤云"
                 "lastName": "郑"
                 "creatorType": "author"
               }
               {
                 "firstName": ""
                 "lastName": "1"
                 "creatorType": "author"
               }
               {
                 "firstName": "凤军"
                 "lastName": "毛"
                 "creatorType": "author"
               }
               {
                 "firstName": ""
                 "lastName": "1"
                 "creatorType": "author"
               }
               {
                 "firstName": "虹"
                 "lastName": "姜"
                 "creatorType": "author"
               }
               {
                 "firstName": ""
                 "lastName": "1"
                 "creatorType": "author"
               }
               {
                 "firstName": "早红"
                 "lastName": "李"
                 "creatorType": "author"
               }
               {
                 "firstName": ""
                 "lastName": "1"
                 "creatorType": "author"
               }
               {
                 "firstName": "明胜"
                 "lastName": "吕"
                 "creatorType": "author"
               }
               {
                 "firstName": ""
                 "lastName": "2"
                 "creatorType": "author"
               }
               {
                 "firstName": "邦"
                 "lastName": "刘"
                 "creatorType": "author"
               }
               {
                 "firstName": ""
                 "lastName": "2"
                 "creatorType": "author"
               }
               {
                 "firstName": "圣强"
                 "lastName": "袁"
                 "creatorType": "author"
               }
               {
                 "firstName": ""
                 "lastName": "1"
                 "creatorType": "author"
               }
             ]

After:

             "creators": [
               {
                 "firstName": "计国"
                 "lastName": "刘"
                 "creatorType": "author"
               }
               {
                 "firstName": "凤云"
                 "lastName": "郑"
                 "creatorType": "author"
               }
               {
                 "firstName": "凤军"
                 "lastName": "毛"
                 "creatorType": "author"
               }
               {
                 "firstName": "虹"
                 "lastName": "姜"
                 "creatorType": "author"
               }
               {
                 "firstName": "早红"
                 "lastName": "李"
                 "creatorType": "author"
               }
               {
                 "firstName": "明胜"
                 "lastName": "吕"
                 "creatorType": "author"
               }
               {
                 "firstName": "邦"
                 "lastName": "刘"
                 "creatorType": "author"
               }
               {
                 "firstName": "圣强"
                 "lastName": "袁"
                 "creatorType": "author"
               }
             ]
zepinglee commented 1 year ago

给个 test case?更能说明这样修改的必要性。

wohenbushuang commented 1 year ago

@zepinglee 已编辑

zepinglee commented 1 year ago

我的意思是在这里添加 test case。在 Zotero 的 translator editor 比较方便添加,也方便以后修改时测试。

https://github.com/l0o0/translators_CN/blob/8ca7b74a5cdd811d7c2eecfd9042b97c39d9f86a/translators/Wanfang%20Data.js#L437-L501

另外会不会有一人有多个隶属单位的情况,比如“张三1,2”?

wohenbushuang commented 1 year ago

我这边editor里的test run总是输出空白,run and updated 后都清空了……

    {
        "type": "web",
        "url": "https://d.wanfangdata.com.cn/periodical/ysxb98202209004",
        "items": [
            {
                "itemType": "journalArticle",
                "title": "万方数据知识服务平台",
                "creators": [],
                "language": "zh-CN",
                "libraryCatalog": "Wanfang Data",
                "url": "https://d.wanfangdata.com.cn/periodical/ysxb98202209004",
                "attachments": [],
                "tags": [],
                "notes": [],
                "seeAlso": []
            }
        ]
    }

比如我tag scrape后打印出来是

20:22:03 Returned item:
...
             "tags": [
               {
                 "tag": "关键词:"
               }
               {
                 "tag": "古近系"
               }
               {
                 "tag": "Sokor1组"
               }
               {
                 "tag": "储层物性"
               }
               {
                 "tag": "影响因素"
               }
               {
                 "tag": "Termit盆地"
               }
             ]

(这个"关键词:"的tag好像是个新的bug啊……

test run的结果是

20:26:24 Translation successful
20:26:24 TranslatorTester: Data mismatch detected:
20:26:24   {
             "itemType": "journalArticle"
             "creators": []
             "attachments": []
             "tags": [
         -     {
         -       "tag": "Sokor1组"
         -     }
         -     {
         -       "tag": "Termit盆地"
         -     }
         -     {
         -       "tag": "储层物性"
         -     }
         -     {
         -       "tag": "关键词:"
         -     }
         -     {
         -       "tag": "古近系"
         -     }
         -     {
         -       "tag": "影响因素"
         -     }
             ]
             "notes": []
             "seeAlso": []
             "title": "万方数据知识服务平台"
             "language": "zh-CN"
             "libraryCatalog": "Wanfang Data"
             "url": "https://d.wanfangdata.com.cn/periodical/ysxb98202209004"
           }
20:26:24 TranslatorTester: Wanfang Data Test 1: unknown (Item 0 does not match)

另外会不会有一人有多个隶属单位的情况,比如“张三1,2”?

目前没见到过这样的情况,如果有了再请发现的人提供下样例地址吧

l0o0 commented 1 year ago

我先把代码合并了,测试用例,我后面手动添加一下。

zepinglee commented 1 year ago

我这边editor里的test run总是输出空白,run and updated 后都清空了……

  {
      "type": "web",
      "url": "https://d.wanfangdata.com.cn/periodical/ysxb98202209004",
      "items": [
          {
              "itemType": "journalArticle",
              "title": "万方数据知识服务平台",
              "creators": [],
              "language": "zh-CN",
              "libraryCatalog": "Wanfang Data",
              "url": "https://d.wanfangdata.com.cn/periodical/ysxb98202209004",
              "attachments": [],
              "tags": [],
              "notes": [],
              "seeAlso": []
          }
      ]
  }

我试了一下也是这样,似乎万方用了什么技术导致在 scaffold 不能直接抓取信息。这样调试起来就很麻烦。在不过在浏览器端抓取是正常的。

l0o0 commented 1 year ago

万方这里使用了加密,网页显示正常。直接在test里run会报错。