xinyadu / eeqa

Event Extraction by Answering (Almost) Natural Questions
MIT License
208 stars 50 forks source link

can't run pre-processinng code. #6

Closed Akeepers closed 3 years ago

Akeepers commented 4 years ago

hi, xinya 我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’

  File "scripts/data/ace-event/convert_examples.py", line 11, in <module>
    line = json.loads(line)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:

我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图:

image

请问下这个是什么原因?

mawenjie8731 commented 4 years ago

hi, xinya 我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’

  File "scripts/data/ace-event/convert_examples.py", line 11, in <module>
    line = json.loads(line)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:

  • 在解析ace05数据的时候,我采用了default-setting: python ./scripts/data/ace-event/parse_ace_event.py default-settings

我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图:

image

请问下这个是什么原因?

请问在运行python ./scripts/data/ace-event/parse_ace_event.py default-settings的时候有没有出现ipdb调试的情况,你是怎么解决的,谢谢

Akeepers commented 4 years ago

hi, xinya 我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’

  File "scripts/data/ace-event/convert_examples.py", line 11, in <module>
    line = json.loads(line)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:

  • 在解析ace05数据的时候,我采用了default-setting: python ./scripts/data/ace-event/parse_ace_event.py default-settings

我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图: image 请问下这个是什么原因?

请问在运行python ./scripts/data/ace-event/parse_ace_event.py default-settings的时候有没有出现ipdb调试的情况,你是怎么解决的,谢谢

我没有遇到过这个问题,就运行脚本,提示parse train这样,然后就运行结束,生成了对应的数据

JunnYu commented 3 years ago

@Akeepers 你好,我也是这个问题,请问你解决了嘛?

Traceback (most recent call last):
  File "scripts/data/ace-event/convert_examples.py", line 10, in <module>
    line = json.loads(line)
  File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
Akeepers commented 3 years ago

@Akeepers 你好,我也是这个问题,请问你解决了嘛?

Traceback (most recent call last):
  File "scripts/data/ace-event/convert_examples.py", line 10, in <module>
    line = json.loads(line)
  File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

我觉得paper中的方法比较弱,从结果来看,也偏低,所以不准备复现了~ 这个问题没有解决 & 也不会继续关注了

JunnYu commented 3 years ago

@Akeepers 我差不多知道啥问题了,他可能要的输入是一行数据,我生成的这个它自动格式化成很多行了。

{
    "sentences": [
        [
            "CNN_CF_20030303.1900.02"
        ],
        [
            "STORY"
        ],
        [
            "2003",
            "-",
            "03",
            "-",
            "03T19:00:00",
            "-",
            "05:00"
        ],
        [
            "New",
            "Questions",
            "About",
            "Attacking",
            "Iraq",
            ";",
            "Is",
            "Torturing",
            "Terrorists",
            "Necessary",
            "?"
        ],
        [
            "NOVAK",
            "Welcome",
            "back",
            "."
        ],
        [
            "Orders",
            "went",
            "out",
            "today",
            "to",
            "deploy",
            "17,000",
            "U.S.",
            "Army",
            "soldiers",
            "in",
            "the",
            "Persian",
            "Gulf",
            "region",
            "."
        ],
        [
            "The",
            "army",
            "'s",
            "entire",
            "first",
            "Calvary",
            "division",
            "based",
            "at",
            "Fort",
            "Hood",
            ",",
            "Texas",
            ",",
            "would",
            "join",
            "the",
            "quarter",
            "million",
            "U.S.",
            "forces",
            "already",
            "in",
            "the",
            "region",
            "."
        ],
        [
            "We",
            "'re",
            "talking",
            "about",
            "possibilities",
            "of",
            "full",
            "scale",
            "war",
            "with",
            "former",
            "Congressman",
            "Tom",
            "Andrews",
            ",",
            "Democrat",
            "of",
            "Maine",
            "."
        ],
        [
            "He",
            "'s",
            "now",
            "national",
            "director",
            "of",
            "Win",
            "Without",
            "War",
            ",",
            "and",
            "former",
            "Congressman",
            "Bob",
            "Dornan",
            ",",
            "Republican",
            "of",
            "California",
            "."
        ],
        [
            "BEGALA",
            "Bob",
            ",",
            "one",
            "of",
            "the",
            "reasons",
            "I",
            "think",
            "so",
            "many",
            "Americans",
            "are",
            "worried",
            "about",
            "this",
            "war",
            "and",
            "so",
            "many",
            "people",
            "around",
            "the",
            "world",
            "do",
            "n't",
            "want",
            "to",
            "go",
            "is",
            "there",
            "have",
            "been",
            "a",
            "lot",
            "of",
            "problems",
            "with",
            "credibility",
            "from",
            "this",
            "administration",
            "."
        ],
        [
            "Our",
            "president",
            "has",
            "repeatedly",
            ",",
            "for",
            "example",
            ",",
            "relied",
            "on",
            "a",
            "man",
            "whom",
            "you",
            "'re",
            "aware",
            ",",
            "Hussein",
            "Kamel",
            ",",
            "Saddam",
            "Hussein",
            "'s",
            "son",
            "-",
            "in",
            "-",
            "law",
            ",",
            "leader",
            "of",
            "the",
            "Iraq",
            "arms",
            "program",
            "who",
            "defected",
            "for",
            "a",
            "time",
            "."
        ],
        [
            "And",
            "gave",
            "us",
            "a",
            "whole",
            "lot",
            "of",
            "information",
            "and",
            "then",
            "went",
            "home",
            "and",
            "his",
            "father",
            "-",
            "in",
            "-",
            "law",
            "killed",
            "him",
            "."
        ],
        [
            "Bad",
            "move",
            "."
        ],
        [
            "But",
            "while",
            "he",
            "was",
            "here",
            ",",
            "he",
            "gave",
            "us",
            "a",
            "whole",
            "lot",
            "of",
            "information",
            "."
        ],
        [
            "Gave",
            "us",
            "a",
            "whole",
            "lot",
            "of",
            "information",
            "."
        ],
        [
            "Well",
            ",",
            "our",
            "president",
            "told",
            "us",
            "that",
            "information",
            "proves",
            "that",
            "the",
            "dictator",
            "had",
            "chemical",
            "weapons",
            ",",
            "which",
            "is",
            "true",
            "."
        ],
        [
            "But",
            "what",
            "we",
            "just",
            "learned",
            "this",
            "week",
            "from",
            "\"",
            "Newsweek",
            "\"",
            "magazine",
            "which",
            "got",
            "a",
            "hold",
            "of",
            "the",
            "debriefings",
            ",",
            "is",
            "that",
            "he",
            "also",
            "told",
            "us",
            "it",
            "was",
            "destroyed",
            "back",
            "in",
            "1995",
            "."
        ],
        [
            "Why",
            "has",
            "n't",
            "our",
            "president",
            "told",
            "us",
            "that",
            "?"
        ],
        [
            "Why",
            "do",
            "we",
            "have",
            "to",
            "learn",
            "it",
            "from",
            "\"",
            "Newsweek",
            "\"",
            "?"
        ],
        [
            "DORNAN",
            "I",
            "do",
            "n't",
            "believe",
            "that",
            "he",
            "believed",
            "it",
            "was",
            "all",
            "destroyed",
            "."
        ],
        [
            "The",
            "fact",
            "that",
            "this",
            "guy",
            "was",
            "such",
            "an",
            "idiot",
            "to",
            "go",
            "back",
            "and",
            "let",
            "his",
            "father",
            "-",
            "in",
            "-",
            "law",
            "kill",
            "him",
            "shows",
            "he",
            "was",
            "n't",
            "the",
            "most",
            "stable",
            "of",
            "people",
            "."
        ],
        [
            "But",
            "the",
            "things",
            "that",
            "..."
        ],
        [
            "BEGALA",
            "Good",
            "point",
            "."
        ],
        [
            "But",
            "should",
            "n't",
            "our",
            "president",
            "have",
            "told",
            "us",
            "what",
            "the",
            "CIA",
            "told",
            "him",
            "."
        ],
        [
            "Why",
            "do",
            "we",
            "learn",
            "from",
            "\"",
            "Newsweek?\"Should",
            "he",
            "level",
            "with",
            "us",
            "?"
        ],
        [
            "DORNAN",
            "Paul",
            ",",
            "look",
            ",",
            "the",
            "problem",
            "is",
            "I",
            "would",
            "stipulate",
            "all",
            "four",
            "of",
            "us",
            "hates",
            "war",
            ".",
            "Any",
            "rational",
            "person",
            "hates",
            "war",
            "."
        ],
        [
            "Bush",
            "is",
            "n't",
            "sitting",
            "in",
            "that",
            "White",
            "House",
            "not",
            "thinking",
            "about",
            "the",
            "body",
            "bags",
            "coming",
            "home",
            "with",
            "great",
            "young",
            "men",
            "."
        ],
        [
            "Clinton",
            "suffered",
            "greatly",
            "over",
            "the",
            "19",
            "Rangers",
            "that",
            "died",
            ",",
            "18",
            "on",
            "the",
            "3rd",
            "of",
            "October",
            "and",
            "Matt",
            "Reersen",
            "(",
            "ph",
            ")",
            "three",
            "days",
            "later",
            "."
        ],
        [
            "I",
            "visited",
            "all",
            "their",
            "families",
            "."
        ],
        [
            "I",
            "was",
            "at",
            "the",
            "medal",
            "of",
            "honor",
            "ceremony",
            "for",
            "the",
            "kids",
            "."
        ],
        [
            "Let",
            "me",
            "tell",
            "you",
            ",",
            "what",
            "trips",
            "to",
            "Walter",
            "Reed",
            "taught",
            "me",
            "was",
            ",",
            "that",
            "whoever",
            "thought",
            "up",
            "the",
            "term",
            ",",
            "the",
            "law",
            "of",
            "unintended",
            "consequences",
            "it",
            "pertains",
            "to",
            "war",
            "."
        ],
        [
            "I",
            "am",
            "shook",
            "over",
            "the",
            "aftermath",
            "."
        ],
        [
            "But",
            ",",
            "this",
            "guy",
            "is",
            "a",
            "monster",
            ",",
            "a",
            "mini",
            "-",
            "me",
            "Hitler",
            "."
        ],
        [
            "He",
            "will",
            "blow",
            "a",
            "city",
            "off",
            "the",
            "earth",
            "in",
            "a",
            "minute",
            "if",
            "he",
            "can",
            "get",
            "the",
            "hold",
            "of",
            "the",
            "means",
            "to",
            "do",
            "it",
            "."
        ],
        [
            "NOVAK",
            "Tom",
            "Andrews",
            ",",
            "I",
            "think",
            "we",
            "all",
            "realize",
            "that",
            "a",
            "government",
            "does",
            "n't",
            "go",
            "to",
            "war",
            "a",
            "nation",
            "goes",
            "to",
            "war",
            "."
        ],
        [
            "And",
            "so",
            "I",
            "would",
            "like",
            "you",
            "to",
            "take",
            "a",
            "look",
            "at",
            "the",
            "CNN/\"USA",
            "TODAY\"",
            "/",
            "Gallup",
            "poll",
            ",",
            "taken",
            "last",
            "week",
            ",",
            "should",
            "U.S.",
            "troops",
            "to",
            "go",
            "to",
            "Iraq",
            "to",
            "remove",
            "Saddam",
            "Hussein",
            "from",
            "power",
            "."
        ],
        [
            "Take",
            "a",
            "look",
            "at",
            "it",
            "."
        ],
        [
            "Favor",
            "59",
            "%",
            ",",
            "opposed",
            "37",
            "%",
            ",",
            "that",
            "'s",
            "a",
            "vastly",
            "larger",
            "support",
            "than",
            "President",
            "Bush",
            "Senior",
            "had",
            "in",
            "getting",
            "the",
            "U.S.",
            "troops",
            "out",
            "of",
            "Kuwait",
            "before",
            "that",
            "war",
            "started",
            ".",
            "That",
            "'s",
            "pretty",
            "good",
            "support",
            "is",
            "n't",
            "it",
            "?"
        ],
        [
            "ANDREWS",
            "Now",
            ",",
            "Bob",
            ",",
            "come",
            "on",
            "you",
            "do",
            "n't",
            "really",
            "buy",
            "this",
            "."
        ],
        [
            "I",
            "mean",
            ",",
            "listen",
            ",",
            "this",
            "is",
            "the",
            "oldest",
            "trick",
            "in",
            "the",
            "book",
            "."
        ],
        [
            "You",
            "can",
            "have",
            "a",
            "general",
            "question",
            "like",
            "this",
            "that",
            "could",
            "mean",
            "anything",
            "and",
            "ask",
            "people",
            "and",
            "they",
            "give",
            "you",
            "what",
            "comes",
            "off",
            "the",
            "top",
            "of",
            "their",
            "head",
            "."
        ],
        [
            "But",
            ",",
            "ask",
            "them",
            "another",
            "question",
            ",",
            "ask",
            "them",
            "what",
            "they",
            "think",
            "about",
            "spending",
            "$",
            "1.3",
            "trillion",
            "in",
            "destroying",
            "this",
            "economy",
            "."
        ],
        [
            "Ask",
            "them",
            "about",
            "going",
            "and",
            "not",
            "just",
            "a",
            "war",
            ",",
            "Bob",
            ",",
            "but",
            "an",
            "invasion",
            "and",
            "occupying",
            "for",
            "up",
            "to",
            "10",
            "years",
            "a",
            "sovereign",
            "Arab",
            "nation",
            "in",
            "the",
            "midst",
            "of",
            "one",
            "of",
            "the",
            "most",
            "distable",
            "and",
            "volatile",
            "regions",
            "in",
            "the",
            "world",
            "."
        ],
        [
            "Ask",
            "them",
            "how",
            "they",
            "feel",
            "about",
            "getting",
            "bogged",
            "down",
            "."
        ],
        [
            "Well",
            ",",
            "I",
            "'ve",
            "seen",
            "some",
            "of",
            "the",
            "figures",
            "."
        ],
        [
            "Once",
            "you",
            "start",
            "telling",
            "Americans",
            "the",
            "story",
            ",",
            "--",
            "the",
            "administration",
            "refuse",
            "today",
            "tell",
            "us",
            "story",
            "."
        ],
        [
            "They",
            "'re",
            "not",
            "coming",
            "forward",
            "and",
            "telling",
            "us",
            "what",
            "the",
            "risks",
            "are",
            ",",
            "what",
            "the",
            "costs",
            "are",
            ",",
            "how",
            "many",
            "years",
            "we",
            "might",
            "be",
            "in",
            ",",
            "the",
            "possibility",
            "of",
            "us",
            "getting",
            "bogged",
            "down",
            ",",
            "because",
            "what",
            "Americans",
            "know",
            "that",
            ",",
            "they",
            "'re",
            "opposed",
            "to",
            "this",
            "war",
            "."
        ],
        [
            "The",
            "more",
            "they",
            "learn",
            "about",
            "this",
            "invasion",
            ",",
            "the",
            "more",
            "they",
            "learn",
            "about",
            "this",
            "occupation",
            ",",
            "the",
            "less",
            "they",
            "support",
            "it",
            "."
        ],
        [
            "DORNAN",
            "Tom",
            ",",
            "you",
            "know",
            "what",
            "liberals",
            "want",
            "."
        ],
        [
            "They",
            "do",
            "n't",
            "want",
            "a",
            "smoking",
            "gun",
            ",",
            "they",
            "want",
            "a",
            "smoking",
            "city",
            "."
        ],
        [
            "The",
            "Clinton",
            "people",
            "all",
            "say",
            "..."
        ],
        [
            "BEGALA",
            "That",
            "'s",
            "going",
            "to",
            "have",
            "to",
            "be",
            "last",
            "war",
            ",",
            "unfair",
            "and",
            "unfortunate",
            "as",
            "that",
            "is",
            ",",
            "I",
            "am",
            "sorry",
            ",",
            "they",
            "'re",
            "telling",
            "us",
            "we",
            "'re",
            "out",
            "of",
            "time",
            "."
        ],
        [
            "Former",
            "Congressman",
            ",",
            "Bob",
            "Dornan",
            "from",
            "California",
            "..."
        ],
        [
            "DORNAN",
            "You",
            "'re",
            "not",
            "going",
            "to",
            "get",
            "a",
            "smoking",
            "city",
            "."
        ]
    ],
    "ner": [
        [],
        [],
        [],
        [],
        [
            [
                20,
                20,
                "PER"
            ]
        ],
        [
            [
                31,
                31,
                "GPE"
            ],
            [
                32,
                32,
                "ORG"
            ],
            [
                33,
                33,
                "PER"
            ],
            [
                36,
                37,
                "LOC"
            ],
            [
                38,
                38,
                "LOC"
            ]
        ],
        [
            [
                41,
                41,
                "ORG"
            ],
            [
                44,
                46,
                "ORG"
            ],
            [
                49,
                50,
                "GPE"
            ],
            [
                52,
                52,
                "GPE"
            ],
            [
                59,
                59,
                "GPE"
            ],
            [
                60,
                60,
                "PER"
            ],
            [
                64,
                64,
                "LOC"
            ]
        ],
        [
            [
                77,
                77,
                "PER"
            ],
            [
                78,
                79,
                "PER"
            ],
            [
                81,
                81,
                "PER"
            ],
            [
                83,
                83,
                "GPE"
            ]
        ],
        [
            [
                89,
                89,
                "PER"
            ],
            [
                91,
                93,
                "ORG"
            ],
            [
                97,
                97,
                "PER"
            ],
            [
                98,
                99,
                "PER"
            ],
            [
                101,
                101,
                "PER"
            ],
            [
                103,
                103,
                "GPE"
            ]
        ],
        [
            [
                105,
                105,
                "PER"
            ],
            [
                106,
                106,
                "PER"
            ],
            [
                116,
                116,
                "PER"
            ],
            [
                125,
                125,
                "PER"
            ],
            [
                128,
                128,
                "LOC"
            ],
            [
                146,
                146,
                "ORG"
            ]
        ],
        [
            [
                149,
                149,
                "PER"
            ],
            [
                165,
                166,
                "PER"
            ],
            [
                168,
                169,
                "PER"
            ],
            [
                171,
                175,
                "PER"
            ],
            [
                177,
                177,
                "PER"
            ],
            [
                180,
                180,
                "GPE"
            ],
            [
                182,
                182,
                "ORG"
            ]
        ],
        [
            [
                200,
                200,
                "GPE"
            ],
            [
                203,
                207,
                "PER"
            ]
        ],
        [],
        [],
        [],
        [
            [
                240,
                240,
                "PER"
            ],
            [
                248,
                248,
                "PER"
            ],
            [
                251,
                251,
                "WEA"
            ]
        ],
        [
            [
                266,
                268,
                "ORG"
            ]
        ],
        [
            [
                294,
                294,
                "PER"
            ]
        ],
        [
            [
                308,
                308,
                "ORG"
            ]
        ],
        [
            [
                311,
                311,
                "PER"
            ]
        ],
        [
            [
                328,
                328,
                "PER"
            ],
            [
                332,
                332,
                "PER"
            ],
            [
                339,
                343,
                "PER"
            ],
            [
                354,
                354,
                "PER"
            ]
        ],
        [],
        [
            [
                361,
                361,
                "PER"
            ]
        ],
        [
            [
                369,
                369,
                "PER"
            ],
            [
                375,
                375,
                "ORG"
            ]
        ],
        [
            [
                385,
                385,
                "ORG"
            ]
        ],
        [
            [
                391,
                391,
                "PER"
            ],
            [
                392,
                392,
                "PER"
            ],
            [
                411,
                411,
                "PER"
            ]
        ],
        [
            [
                415,
                415,
                "PER"
            ],
            [
                421,
                422,
                "FAC"
            ],
            [
                434,
                434,
                "PER"
            ]
        ],
        [
            [
                436,
                436,
                "PER"
            ],
            [
                442,
                442,
                "PER"
            ],
            [
                453,
                454,
                "PER"
            ]
        ],
        [
            [
                466,
                466,
                "PER"
            ]
        ],
        [
            [
                478,
                478,
                "PER"
            ]
        ],
        [
            [
                488,
                489,
                "FAC"
            ]
        ],
        [],
        [
            [
                521,
                521,
                "PER"
            ],
            [
                524,
                524,
                "PER"
            ],
            [
                527,
                529,
                "PER"
            ],
            [
                530,
                530,
                "PER"
            ]
        ],
        [
            [
                536,
                536,
                "GPE"
            ],
            [
                539,
                539,
                "LOC"
            ]
        ],
        [
            [
                556,
                556,
                "PER"
            ],
            [
                557,
                558,
                "PER"
            ],
            [
                567,
                567,
                "GPE"
            ],
            [
                574,
                574,
                "GPE"
            ]
        ],
        [
            [
                591,
                591,
                "ORG"
            ],
            [
                591,
                592,
                "ORG"
            ],
            [
                594,
                594,
                "ORG"
            ],
            [
                602,
                602,
                "GPE"
            ],
            [
                603,
                603,
                "PER"
            ],
            [
                607,
                607,
                "GPE"
            ],
            [
                610,
                611,
                "PER"
            ]
        ],
        [],
        [
            [
                636,
                636,
                "PER"
            ],
            [
                637,
                637,
                "PER"
            ],
            [
                638,
                638,
                "PER"
            ],
            [
                643,
                643,
                "GPE"
            ],
            [
                644,
                644,
                "PER"
            ],
            [
                647,
                647,
                "GPE"
            ]
        ],
        [
            [
                662,
                662,
                "PER"
            ],
            [
                665,
                665,
                "PER"
            ]
        ],
        [],
        [
            [
                704,
                704,
                "PER"
            ]
        ],
        [],
        [
            [
                750,
                750,
                "PER"
            ],
            [
                764,
                764,
                "PER"
            ],
            [
                765,
                765,
                "GPE"
            ],
            [
                777,
                777,
                "LOC"
            ],
            [
                780,
                780,
                "LOC"
            ]
        ],
        [],
        [],
        [
            [
                806,
                806,
                "GPE"
            ],
            [
                812,
                812,
                "ORG"
            ]
        ],
        [
            [
                855,
                855,
                "PER"
            ]
        ],
        [],
        [
            [
                888,
                888,
                "PER"
            ],
            [
                889,
                889,
                "PER"
            ],
            [
                894,
                894,
                "PER"
            ]
        ],
        [
            [
                909,
                909,
                "GPE"
            ]
        ],
        [
            [
                912,
                912,
                "PER"
            ],
            [
                913,
                913,
                "PER"
            ]
        ],
        [
            [
                917,
                917,
                "PER"
            ]
        ],
        [
            [
                950,
                950,
                "PER"
            ],
            [
                952,
                953,
                "PER"
            ],
            [
                955,
                955,
                "GPE"
            ]
        ],
        [
            [
                957,
                957,
                "PER"
            ],
            [
                966,
                966,
                "GPE"
            ]
        ]
    ],
    "relations": [
        [],
        [],
        [],
        [],
        [],
        [
            [
                32,
                32,
                31,
                31,
                "PART-WHOLE.Subsidiary"
            ],
            [
                33,
                33,
                32,
                32,
                "ORG-AFF.Employment"
            ],
            [
                33,
                33,
                38,
                38,
                "PHYS.Located"
            ],
            [
                36,
                37,
                38,
                38,
                "PART-WHOLE.Geographical"
            ]
        ],
        [
            [
                44,
                46,
                41,
                41,
                "PART-WHOLE.Subsidiary"
            ],
            [
                44,
                46,
                49,
                50,
                "GEN-AFF.Org-Location"
            ],
            [
                49,
                50,
                52,
                52,
                "PART-WHOLE.Geographical"
            ],
            [
                60,
                60,
                59,
                59,
                "ORG-AFF.Employment"
            ],
            [
                60,
                60,
                64,
                64,
                "PHYS.Located"
            ]
        ],
        [
            [
                81,
                81,
                83,
                83,
                "GEN-AFF.Citizen-Resident-Religion-Ethnicity"
            ]
        ],
        [
            [
                89,
                89,
                91,
                93,
                "ORG-AFF.Membership"
            ],
            [
                101,
                101,
                103,
                103,
                "GEN-AFF.Citizen-Resident-Religion-Ethnicity"
            ]
        ],
        [
            [
                125,
                125,
                128,
                128,
                "GEN-AFF.Citizen-Resident-Religion-Ethnicity"
            ]
        ],
        [
            [
                168,
                169,
                171,
                175,
                "PER-SOC.Family"
            ],
            [
                177,
                177,
                182,
                182,
                "ORG-AFF.Membership"
            ],
            [
                182,
                182,
                180,
                180,
                "PART-WHOLE.Subsidiary"
            ]
        ],
        [],
        [],
        [],
        [],
        [
            [
                248,
                248,
                251,
                251,
                "ART.User-Owner-Inventor-Manufacturer"
            ]
        ],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [
            [
                415,
                415,
                421,
                422,
                "PHYS.Located"
            ]
        ],
        [],
        [],
        [],
        [],
        [],
        [],
        [
            [
                536,
                536,
                539,
                539,
                "PART-WHOLE.Geographical"
            ]
        ],
        [],
        [
            [
                603,
                603,
                602,
                602,
                "ORG-AFF.Employment"
            ],
            [
                603,
                603,
                607,
                607,
                "PHYS.Located"
            ],
            [
                610,
                611,
                607,
                607,
                "PHYS.Located"
            ]
        ],
        [],
        [
            [
                644,
                644,
                643,
                643,
                "ORG-AFF.Employment"
            ],
            [
                644,
                644,
                647,
                647,
                "PHYS.Located"
            ]
        ],
        [],
        [],
        [],
        [],
        [
            [
                777,
                777,
                780,
                780,
                "PART-WHOLE.Geographical"
            ]
        ],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [
            [
                952,
                953,
                955,
                955,
                "GEN-AFF.Citizen-Resident-Religion-Ethnicity"
            ]
        ],
        []
    ],
    "events": [
        [],
        [],
        [],
        [],
        [],
        [
            [
                [
                    29,
                    "Movement.Transport"
                ],
                [
                    33,
                    33,
                    "Artifact"
                ],
                [
                    38,
                    38,
                    "Destination"
                ]
            ]
        ],
        [],
        [
            [
                [
                    74,
                    "Conflict.Attack"
                ]
            ],
            [
                [
                    76,
                    "Personnel.End-Position"
                ],
                [
                    78,
                    79,
                    "Person"
                ],
                [
                    83,
                    83,
                    "Entity"
                ]
            ]
        ],
        [
            [
                [
                    96,
                    "Personnel.End-Position"
                ],
                [
                    98,
                    99,
                    "Person"
                ],
                [
                    103,
                    103,
                    "Entity"
                ]
            ]
        ],
        [
            [
                [
                    121,
                    "Conflict.Attack"
                ]
            ]
        ],
        [
            [
                [
                    184,
                    "Personnel.End-Position"
                ],
                [
                    177,
                    177,
                    "Person"
                ],
                [
                    180,
                    180,
                    "Entity"
                ]
            ]
        ],
        [
            [
                [
                    199,
                    "Movement.Transport"
                ],
                [
                    200,
                    200,
                    "Destination"
                ]
            ],
            [
                [
                    208,
                    "Life.Die"
                ],
                [
                    200,
                    200,
                    "Place"
                ],
                [
                    203,
                    207,
                    "Agent"
                ]
            ]
        ],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [],
        [
            [
                [
                    334,
                    "Movement.Transport"
                ],
                [
                    328,
                    328,
                    "Artifact"
                ]
            ],
            [
                [
                    344,
                    "Life.Die"
                ],
                [
                    339,
                    343,
                    "Agent"
                ]
            ]
        ],
        [],
        [],
        [],
        [],
        [
            [
                [
                    407,
                    "Conflict.Attack"
                ]
            ],
            [
                [
                    413,
                    "Conflict.Attack"
                ]
            ]
        ],
        [],
        [
            [
                [
                    444,
                    "Life.Die"
                ],
                [
                    442,
                    442,
                    "Victim"
                ],
                [
                    453,
                    454,
                    "Victim"
                ]
            ]
        ],
        [
            [
                [
                    463,
                    "Contact.Meet"
                ],
                [
                    466,
                    466,
                    "Entity"
                ]
            ]
        ],
        [],
        [
            [
                [
                    486,
                    "Movement.Transport"
                ]
            ],
            [
                [
                    509,
                    "Conflict.Attack"
                ]
            ]
        ],
        [],
        [],
        [
            [
                [
                    534,
                    "Conflict.Attack"
                ],
                [
                    539,
                    539,
                    "Place"
                ]
            ]
        ],
        [
            [
                [
                    572,
                    "Conflict.Attack"
                ]
            ],
            [
                [
                    577,
                    "Conflict.Attack"
                ],
                [
                    574,
                    574,
                    "Attacker"
                ]
            ]
        ],
        [
            [
                [
                    605,
                    "Movement.Transport"
                ],
                [
                    603,
                    603,
                    "Artifact"
                ],
                [
                    607,
                    607,
                    "Destination"
                ]
            ]
        ],
        [],
        [
            [
                [
                    641,
                    "Movement.Transport"
                ],
                [
                    637,
                    637,
                    "Agent"
                ],
                [
                    644,
                    644,
                    "Artifact"
                ],
                [
                    647,
                    647,
                    "Origin"
                ]
            ],
            [
                [
                    650,
                    "Conflict.Attack"
                ]
            ]
        ],
        [],
        [],
        [],
        [],
        [
            [
                [
                    748,
                    "Conflict.Attack"
                ]
            ],
            [
                [
                    754,
                    "Conflict.Attack"
                ]
            ]
        ],
        [],
        [],
        [],
        [
            [
                [
                    864,
                    "Conflict.Attack"
                ]
            ]
        ],
        [
            [
                [
                    872,
                    "Conflict.Attack"
                ]
            ]
        ],
        [],
        [],
        [],
        [
            [
                [
                    926,
                    "Conflict.Attack"
                ]
            ]
        ],
        [],
        []
    ],
    "sentence_start": [
        0,
        1,
        2,
        9,
        20,
        24,
        40,
        66,
        85,
        105,
        148,
        189,
        211,
        214,
        229,
        237,
        257,
        290,
        299,
        311,
        324,
        356,
        361,
        365,
        379,
        391,
        415,
        436,
        462,
        468,
        480,
        511,
        518,
        532,
        556,
        579,
        615,
        621,
        662,
        676,
        690,
        718,
        740,
        782,
        792,
        802,
        819,
        866,
        888,
        897,
        911,
        917,
        949,
        957
    ],
    "doc_key": "CNN_CF_20030303.1900.02"
}
Akeepers commented 3 years ago

@JunnYu 嗯嗯~

HuangZhenyang commented 3 years ago

parse_ace_event.py处理完的train.json,形式是{}{}{},一个json object不是放在同一行,不同的json object之间也没有分隔符

作者的代码是一行一行去读取和解析的,所以会报错... 可能是这个代码不适用default-setting的情况?(没具体看parse_ace_event.py

image

下面是我改过的convert_examples.py

from os import path
import json
import collections

output_dir = "./data/ace-event/processed-data/json"
tmp_json_dir = "./data/ace-event/processed-data/default-settings/json"

for fold in ["train", "dev", "test"]:
    f_convert = open(path.join(output_dir, fold + "_convert.json"), "w")

    with open(path.join(tmp_json_dir, fold + ".json"), "r") as f:
        json_str = ""
        ed_char = "}"

        for line in f.readlines():
            line = line.strip()
            json_str += line
            if line == ed_char:
                json_obj = json.loads(json_str)
                json_str = ""

                sentences = json_obj["sentences"]
                ner = json_obj["ner"]
                relations = json_obj["relations"]
                events = json_obj["events"]
                sentence_start = json_obj["sentence_start"]
                doc_key = json_obj["doc_key"]

                assert len(sentence_start) == len(ner) == len(relations) == len(events) == len(sentence_start)

                for sentence, ner, relation, event, s_start in zip(sentences, ner, relations, events, sentence_start):
                    # sentence_annotated = dict()
                    sentence_annotated = collections.OrderedDict()
                    sentence_annotated["sentence"] = sentence
                    sentence_annotated["s_start"] = s_start
                    sentence_annotated["ner"] = ner
                    sentence_annotated["relation"] = relation
                    sentence_annotated["event"] = event

                    # if sentence_annotated["s_start"]>5:
                    f_convert.write(json.dumps(sentence_annotated, default=int) + "\n")