Closed Akeepers closed 3 years ago
hi, xinya 我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’
File "scripts/data/ace-event/convert_examples.py", line 11, in <module> line = json.loads(line) File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads return _default_decoder.decode(s) File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:
- 在解析ace05数据的时候,我采用了default-setting: python ./scripts/data/ace-event/parse_ace_event.py default-settings
我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图:
请问下这个是什么原因?
请问在运行python ./scripts/data/ace-event/parse_ace_event.py default-settings的时候有没有出现ipdb调试的情况,你是怎么解决的,谢谢
hi, xinya 我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’
File "scripts/data/ace-event/convert_examples.py", line 11, in <module> line = json.loads(line) File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads return _default_decoder.decode(s) File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:
- 在解析ace05数据的时候,我采用了default-setting: python ./scripts/data/ace-event/parse_ace_event.py default-settings
我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图: 请问下这个是什么原因?
请问在运行python ./scripts/data/ace-event/parse_ace_event.py default-settings的时候有没有出现ipdb调试的情况,你是怎么解决的,谢谢
我没有遇到过这个问题,就运行脚本,提示parse train这样,然后就运行结束,生成了对应的数据
@Akeepers 你好,我也是这个问题,请问你解决了嘛?
Traceback (most recent call last):
File "scripts/data/ace-event/convert_examples.py", line 10, in <module>
line = json.loads(line)
File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
@Akeepers 你好,我也是这个问题,请问你解决了嘛?
Traceback (most recent call last): File "scripts/data/ace-event/convert_examples.py", line 10, in <module> line = json.loads(line) File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 348, in loads return _default_decoder.decode(s) File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "C:\Users\yujun\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
我觉得paper中的方法比较弱,从结果来看,也偏低,所以不准备复现了~ 这个问题没有解决 & 也不会继续关注了
@Akeepers 我差不多知道啥问题了,他可能要的输入是一行数据,我生成的这个它自动格式化成很多行了。
{
"sentences": [
[
"CNN_CF_20030303.1900.02"
],
[
"STORY"
],
[
"2003",
"-",
"03",
"-",
"03T19:00:00",
"-",
"05:00"
],
[
"New",
"Questions",
"About",
"Attacking",
"Iraq",
";",
"Is",
"Torturing",
"Terrorists",
"Necessary",
"?"
],
[
"NOVAK",
"Welcome",
"back",
"."
],
[
"Orders",
"went",
"out",
"today",
"to",
"deploy",
"17,000",
"U.S.",
"Army",
"soldiers",
"in",
"the",
"Persian",
"Gulf",
"region",
"."
],
[
"The",
"army",
"'s",
"entire",
"first",
"Calvary",
"division",
"based",
"at",
"Fort",
"Hood",
",",
"Texas",
",",
"would",
"join",
"the",
"quarter",
"million",
"U.S.",
"forces",
"already",
"in",
"the",
"region",
"."
],
[
"We",
"'re",
"talking",
"about",
"possibilities",
"of",
"full",
"scale",
"war",
"with",
"former",
"Congressman",
"Tom",
"Andrews",
",",
"Democrat",
"of",
"Maine",
"."
],
[
"He",
"'s",
"now",
"national",
"director",
"of",
"Win",
"Without",
"War",
",",
"and",
"former",
"Congressman",
"Bob",
"Dornan",
",",
"Republican",
"of",
"California",
"."
],
[
"BEGALA",
"Bob",
",",
"one",
"of",
"the",
"reasons",
"I",
"think",
"so",
"many",
"Americans",
"are",
"worried",
"about",
"this",
"war",
"and",
"so",
"many",
"people",
"around",
"the",
"world",
"do",
"n't",
"want",
"to",
"go",
"is",
"there",
"have",
"been",
"a",
"lot",
"of",
"problems",
"with",
"credibility",
"from",
"this",
"administration",
"."
],
[
"Our",
"president",
"has",
"repeatedly",
",",
"for",
"example",
",",
"relied",
"on",
"a",
"man",
"whom",
"you",
"'re",
"aware",
",",
"Hussein",
"Kamel",
",",
"Saddam",
"Hussein",
"'s",
"son",
"-",
"in",
"-",
"law",
",",
"leader",
"of",
"the",
"Iraq",
"arms",
"program",
"who",
"defected",
"for",
"a",
"time",
"."
],
[
"And",
"gave",
"us",
"a",
"whole",
"lot",
"of",
"information",
"and",
"then",
"went",
"home",
"and",
"his",
"father",
"-",
"in",
"-",
"law",
"killed",
"him",
"."
],
[
"Bad",
"move",
"."
],
[
"But",
"while",
"he",
"was",
"here",
",",
"he",
"gave",
"us",
"a",
"whole",
"lot",
"of",
"information",
"."
],
[
"Gave",
"us",
"a",
"whole",
"lot",
"of",
"information",
"."
],
[
"Well",
",",
"our",
"president",
"told",
"us",
"that",
"information",
"proves",
"that",
"the",
"dictator",
"had",
"chemical",
"weapons",
",",
"which",
"is",
"true",
"."
],
[
"But",
"what",
"we",
"just",
"learned",
"this",
"week",
"from",
"\"",
"Newsweek",
"\"",
"magazine",
"which",
"got",
"a",
"hold",
"of",
"the",
"debriefings",
",",
"is",
"that",
"he",
"also",
"told",
"us",
"it",
"was",
"destroyed",
"back",
"in",
"1995",
"."
],
[
"Why",
"has",
"n't",
"our",
"president",
"told",
"us",
"that",
"?"
],
[
"Why",
"do",
"we",
"have",
"to",
"learn",
"it",
"from",
"\"",
"Newsweek",
"\"",
"?"
],
[
"DORNAN",
"I",
"do",
"n't",
"believe",
"that",
"he",
"believed",
"it",
"was",
"all",
"destroyed",
"."
],
[
"The",
"fact",
"that",
"this",
"guy",
"was",
"such",
"an",
"idiot",
"to",
"go",
"back",
"and",
"let",
"his",
"father",
"-",
"in",
"-",
"law",
"kill",
"him",
"shows",
"he",
"was",
"n't",
"the",
"most",
"stable",
"of",
"people",
"."
],
[
"But",
"the",
"things",
"that",
"..."
],
[
"BEGALA",
"Good",
"point",
"."
],
[
"But",
"should",
"n't",
"our",
"president",
"have",
"told",
"us",
"what",
"the",
"CIA",
"told",
"him",
"."
],
[
"Why",
"do",
"we",
"learn",
"from",
"\"",
"Newsweek?\"Should",
"he",
"level",
"with",
"us",
"?"
],
[
"DORNAN",
"Paul",
",",
"look",
",",
"the",
"problem",
"is",
"I",
"would",
"stipulate",
"all",
"four",
"of",
"us",
"hates",
"war",
".",
"Any",
"rational",
"person",
"hates",
"war",
"."
],
[
"Bush",
"is",
"n't",
"sitting",
"in",
"that",
"White",
"House",
"not",
"thinking",
"about",
"the",
"body",
"bags",
"coming",
"home",
"with",
"great",
"young",
"men",
"."
],
[
"Clinton",
"suffered",
"greatly",
"over",
"the",
"19",
"Rangers",
"that",
"died",
",",
"18",
"on",
"the",
"3rd",
"of",
"October",
"and",
"Matt",
"Reersen",
"(",
"ph",
")",
"three",
"days",
"later",
"."
],
[
"I",
"visited",
"all",
"their",
"families",
"."
],
[
"I",
"was",
"at",
"the",
"medal",
"of",
"honor",
"ceremony",
"for",
"the",
"kids",
"."
],
[
"Let",
"me",
"tell",
"you",
",",
"what",
"trips",
"to",
"Walter",
"Reed",
"taught",
"me",
"was",
",",
"that",
"whoever",
"thought",
"up",
"the",
"term",
",",
"the",
"law",
"of",
"unintended",
"consequences",
"it",
"pertains",
"to",
"war",
"."
],
[
"I",
"am",
"shook",
"over",
"the",
"aftermath",
"."
],
[
"But",
",",
"this",
"guy",
"is",
"a",
"monster",
",",
"a",
"mini",
"-",
"me",
"Hitler",
"."
],
[
"He",
"will",
"blow",
"a",
"city",
"off",
"the",
"earth",
"in",
"a",
"minute",
"if",
"he",
"can",
"get",
"the",
"hold",
"of",
"the",
"means",
"to",
"do",
"it",
"."
],
[
"NOVAK",
"Tom",
"Andrews",
",",
"I",
"think",
"we",
"all",
"realize",
"that",
"a",
"government",
"does",
"n't",
"go",
"to",
"war",
"a",
"nation",
"goes",
"to",
"war",
"."
],
[
"And",
"so",
"I",
"would",
"like",
"you",
"to",
"take",
"a",
"look",
"at",
"the",
"CNN/\"USA",
"TODAY\"",
"/",
"Gallup",
"poll",
",",
"taken",
"last",
"week",
",",
"should",
"U.S.",
"troops",
"to",
"go",
"to",
"Iraq",
"to",
"remove",
"Saddam",
"Hussein",
"from",
"power",
"."
],
[
"Take",
"a",
"look",
"at",
"it",
"."
],
[
"Favor",
"59",
"%",
",",
"opposed",
"37",
"%",
",",
"that",
"'s",
"a",
"vastly",
"larger",
"support",
"than",
"President",
"Bush",
"Senior",
"had",
"in",
"getting",
"the",
"U.S.",
"troops",
"out",
"of",
"Kuwait",
"before",
"that",
"war",
"started",
".",
"That",
"'s",
"pretty",
"good",
"support",
"is",
"n't",
"it",
"?"
],
[
"ANDREWS",
"Now",
",",
"Bob",
",",
"come",
"on",
"you",
"do",
"n't",
"really",
"buy",
"this",
"."
],
[
"I",
"mean",
",",
"listen",
",",
"this",
"is",
"the",
"oldest",
"trick",
"in",
"the",
"book",
"."
],
[
"You",
"can",
"have",
"a",
"general",
"question",
"like",
"this",
"that",
"could",
"mean",
"anything",
"and",
"ask",
"people",
"and",
"they",
"give",
"you",
"what",
"comes",
"off",
"the",
"top",
"of",
"their",
"head",
"."
],
[
"But",
",",
"ask",
"them",
"another",
"question",
",",
"ask",
"them",
"what",
"they",
"think",
"about",
"spending",
"$",
"1.3",
"trillion",
"in",
"destroying",
"this",
"economy",
"."
],
[
"Ask",
"them",
"about",
"going",
"and",
"not",
"just",
"a",
"war",
",",
"Bob",
",",
"but",
"an",
"invasion",
"and",
"occupying",
"for",
"up",
"to",
"10",
"years",
"a",
"sovereign",
"Arab",
"nation",
"in",
"the",
"midst",
"of",
"one",
"of",
"the",
"most",
"distable",
"and",
"volatile",
"regions",
"in",
"the",
"world",
"."
],
[
"Ask",
"them",
"how",
"they",
"feel",
"about",
"getting",
"bogged",
"down",
"."
],
[
"Well",
",",
"I",
"'ve",
"seen",
"some",
"of",
"the",
"figures",
"."
],
[
"Once",
"you",
"start",
"telling",
"Americans",
"the",
"story",
",",
"--",
"the",
"administration",
"refuse",
"today",
"tell",
"us",
"story",
"."
],
[
"They",
"'re",
"not",
"coming",
"forward",
"and",
"telling",
"us",
"what",
"the",
"risks",
"are",
",",
"what",
"the",
"costs",
"are",
",",
"how",
"many",
"years",
"we",
"might",
"be",
"in",
",",
"the",
"possibility",
"of",
"us",
"getting",
"bogged",
"down",
",",
"because",
"what",
"Americans",
"know",
"that",
",",
"they",
"'re",
"opposed",
"to",
"this",
"war",
"."
],
[
"The",
"more",
"they",
"learn",
"about",
"this",
"invasion",
",",
"the",
"more",
"they",
"learn",
"about",
"this",
"occupation",
",",
"the",
"less",
"they",
"support",
"it",
"."
],
[
"DORNAN",
"Tom",
",",
"you",
"know",
"what",
"liberals",
"want",
"."
],
[
"They",
"do",
"n't",
"want",
"a",
"smoking",
"gun",
",",
"they",
"want",
"a",
"smoking",
"city",
"."
],
[
"The",
"Clinton",
"people",
"all",
"say",
"..."
],
[
"BEGALA",
"That",
"'s",
"going",
"to",
"have",
"to",
"be",
"last",
"war",
",",
"unfair",
"and",
"unfortunate",
"as",
"that",
"is",
",",
"I",
"am",
"sorry",
",",
"they",
"'re",
"telling",
"us",
"we",
"'re",
"out",
"of",
"time",
"."
],
[
"Former",
"Congressman",
",",
"Bob",
"Dornan",
"from",
"California",
"..."
],
[
"DORNAN",
"You",
"'re",
"not",
"going",
"to",
"get",
"a",
"smoking",
"city",
"."
]
],
"ner": [
[],
[],
[],
[],
[
[
20,
20,
"PER"
]
],
[
[
31,
31,
"GPE"
],
[
32,
32,
"ORG"
],
[
33,
33,
"PER"
],
[
36,
37,
"LOC"
],
[
38,
38,
"LOC"
]
],
[
[
41,
41,
"ORG"
],
[
44,
46,
"ORG"
],
[
49,
50,
"GPE"
],
[
52,
52,
"GPE"
],
[
59,
59,
"GPE"
],
[
60,
60,
"PER"
],
[
64,
64,
"LOC"
]
],
[
[
77,
77,
"PER"
],
[
78,
79,
"PER"
],
[
81,
81,
"PER"
],
[
83,
83,
"GPE"
]
],
[
[
89,
89,
"PER"
],
[
91,
93,
"ORG"
],
[
97,
97,
"PER"
],
[
98,
99,
"PER"
],
[
101,
101,
"PER"
],
[
103,
103,
"GPE"
]
],
[
[
105,
105,
"PER"
],
[
106,
106,
"PER"
],
[
116,
116,
"PER"
],
[
125,
125,
"PER"
],
[
128,
128,
"LOC"
],
[
146,
146,
"ORG"
]
],
[
[
149,
149,
"PER"
],
[
165,
166,
"PER"
],
[
168,
169,
"PER"
],
[
171,
175,
"PER"
],
[
177,
177,
"PER"
],
[
180,
180,
"GPE"
],
[
182,
182,
"ORG"
]
],
[
[
200,
200,
"GPE"
],
[
203,
207,
"PER"
]
],
[],
[],
[],
[
[
240,
240,
"PER"
],
[
248,
248,
"PER"
],
[
251,
251,
"WEA"
]
],
[
[
266,
268,
"ORG"
]
],
[
[
294,
294,
"PER"
]
],
[
[
308,
308,
"ORG"
]
],
[
[
311,
311,
"PER"
]
],
[
[
328,
328,
"PER"
],
[
332,
332,
"PER"
],
[
339,
343,
"PER"
],
[
354,
354,
"PER"
]
],
[],
[
[
361,
361,
"PER"
]
],
[
[
369,
369,
"PER"
],
[
375,
375,
"ORG"
]
],
[
[
385,
385,
"ORG"
]
],
[
[
391,
391,
"PER"
],
[
392,
392,
"PER"
],
[
411,
411,
"PER"
]
],
[
[
415,
415,
"PER"
],
[
421,
422,
"FAC"
],
[
434,
434,
"PER"
]
],
[
[
436,
436,
"PER"
],
[
442,
442,
"PER"
],
[
453,
454,
"PER"
]
],
[
[
466,
466,
"PER"
]
],
[
[
478,
478,
"PER"
]
],
[
[
488,
489,
"FAC"
]
],
[],
[
[
521,
521,
"PER"
],
[
524,
524,
"PER"
],
[
527,
529,
"PER"
],
[
530,
530,
"PER"
]
],
[
[
536,
536,
"GPE"
],
[
539,
539,
"LOC"
]
],
[
[
556,
556,
"PER"
],
[
557,
558,
"PER"
],
[
567,
567,
"GPE"
],
[
574,
574,
"GPE"
]
],
[
[
591,
591,
"ORG"
],
[
591,
592,
"ORG"
],
[
594,
594,
"ORG"
],
[
602,
602,
"GPE"
],
[
603,
603,
"PER"
],
[
607,
607,
"GPE"
],
[
610,
611,
"PER"
]
],
[],
[
[
636,
636,
"PER"
],
[
637,
637,
"PER"
],
[
638,
638,
"PER"
],
[
643,
643,
"GPE"
],
[
644,
644,
"PER"
],
[
647,
647,
"GPE"
]
],
[
[
662,
662,
"PER"
],
[
665,
665,
"PER"
]
],
[],
[
[
704,
704,
"PER"
]
],
[],
[
[
750,
750,
"PER"
],
[
764,
764,
"PER"
],
[
765,
765,
"GPE"
],
[
777,
777,
"LOC"
],
[
780,
780,
"LOC"
]
],
[],
[],
[
[
806,
806,
"GPE"
],
[
812,
812,
"ORG"
]
],
[
[
855,
855,
"PER"
]
],
[],
[
[
888,
888,
"PER"
],
[
889,
889,
"PER"
],
[
894,
894,
"PER"
]
],
[
[
909,
909,
"GPE"
]
],
[
[
912,
912,
"PER"
],
[
913,
913,
"PER"
]
],
[
[
917,
917,
"PER"
]
],
[
[
950,
950,
"PER"
],
[
952,
953,
"PER"
],
[
955,
955,
"GPE"
]
],
[
[
957,
957,
"PER"
],
[
966,
966,
"GPE"
]
]
],
"relations": [
[],
[],
[],
[],
[],
[
[
32,
32,
31,
31,
"PART-WHOLE.Subsidiary"
],
[
33,
33,
32,
32,
"ORG-AFF.Employment"
],
[
33,
33,
38,
38,
"PHYS.Located"
],
[
36,
37,
38,
38,
"PART-WHOLE.Geographical"
]
],
[
[
44,
46,
41,
41,
"PART-WHOLE.Subsidiary"
],
[
44,
46,
49,
50,
"GEN-AFF.Org-Location"
],
[
49,
50,
52,
52,
"PART-WHOLE.Geographical"
],
[
60,
60,
59,
59,
"ORG-AFF.Employment"
],
[
60,
60,
64,
64,
"PHYS.Located"
]
],
[
[
81,
81,
83,
83,
"GEN-AFF.Citizen-Resident-Religion-Ethnicity"
]
],
[
[
89,
89,
91,
93,
"ORG-AFF.Membership"
],
[
101,
101,
103,
103,
"GEN-AFF.Citizen-Resident-Religion-Ethnicity"
]
],
[
[
125,
125,
128,
128,
"GEN-AFF.Citizen-Resident-Religion-Ethnicity"
]
],
[
[
168,
169,
171,
175,
"PER-SOC.Family"
],
[
177,
177,
182,
182,
"ORG-AFF.Membership"
],
[
182,
182,
180,
180,
"PART-WHOLE.Subsidiary"
]
],
[],
[],
[],
[],
[
[
248,
248,
251,
251,
"ART.User-Owner-Inventor-Manufacturer"
]
],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[
[
415,
415,
421,
422,
"PHYS.Located"
]
],
[],
[],
[],
[],
[],
[],
[
[
536,
536,
539,
539,
"PART-WHOLE.Geographical"
]
],
[],
[
[
603,
603,
602,
602,
"ORG-AFF.Employment"
],
[
603,
603,
607,
607,
"PHYS.Located"
],
[
610,
611,
607,
607,
"PHYS.Located"
]
],
[],
[
[
644,
644,
643,
643,
"ORG-AFF.Employment"
],
[
644,
644,
647,
647,
"PHYS.Located"
]
],
[],
[],
[],
[],
[
[
777,
777,
780,
780,
"PART-WHOLE.Geographical"
]
],
[],
[],
[],
[],
[],
[],
[],
[],
[],
[
[
952,
953,
955,
955,
"GEN-AFF.Citizen-Resident-Religion-Ethnicity"
]
],
[]
],
"events": [
[],
[],
[],
[],
[],
[
[
[
29,
"Movement.Transport"
],
[
33,
33,
"Artifact"
],
[
38,
38,
"Destination"
]
]
],
[],
[
[
[
74,
"Conflict.Attack"
]
],
[
[
76,
"Personnel.End-Position"
],
[
78,
79,
"Person"
],
[
83,
83,
"Entity"
]
]
],
[
[
[
96,
"Personnel.End-Position"
],
[
98,
99,
"Person"
],
[
103,
103,
"Entity"
]
]
],
[
[
[
121,
"Conflict.Attack"
]
]
],
[
[
[
184,
"Personnel.End-Position"
],
[
177,
177,
"Person"
],
[
180,
180,
"Entity"
]
]
],
[
[
[
199,
"Movement.Transport"
],
[
200,
200,
"Destination"
]
],
[
[
208,
"Life.Die"
],
[
200,
200,
"Place"
],
[
203,
207,
"Agent"
]
]
],
[],
[],
[],
[],
[],
[],
[],
[],
[
[
[
334,
"Movement.Transport"
],
[
328,
328,
"Artifact"
]
],
[
[
344,
"Life.Die"
],
[
339,
343,
"Agent"
]
]
],
[],
[],
[],
[],
[
[
[
407,
"Conflict.Attack"
]
],
[
[
413,
"Conflict.Attack"
]
]
],
[],
[
[
[
444,
"Life.Die"
],
[
442,
442,
"Victim"
],
[
453,
454,
"Victim"
]
]
],
[
[
[
463,
"Contact.Meet"
],
[
466,
466,
"Entity"
]
]
],
[],
[
[
[
486,
"Movement.Transport"
]
],
[
[
509,
"Conflict.Attack"
]
]
],
[],
[],
[
[
[
534,
"Conflict.Attack"
],
[
539,
539,
"Place"
]
]
],
[
[
[
572,
"Conflict.Attack"
]
],
[
[
577,
"Conflict.Attack"
],
[
574,
574,
"Attacker"
]
]
],
[
[
[
605,
"Movement.Transport"
],
[
603,
603,
"Artifact"
],
[
607,
607,
"Destination"
]
]
],
[],
[
[
[
641,
"Movement.Transport"
],
[
637,
637,
"Agent"
],
[
644,
644,
"Artifact"
],
[
647,
647,
"Origin"
]
],
[
[
650,
"Conflict.Attack"
]
]
],
[],
[],
[],
[],
[
[
[
748,
"Conflict.Attack"
]
],
[
[
754,
"Conflict.Attack"
]
]
],
[],
[],
[],
[
[
[
864,
"Conflict.Attack"
]
]
],
[
[
[
872,
"Conflict.Attack"
]
]
],
[],
[],
[],
[
[
[
926,
"Conflict.Attack"
]
]
],
[],
[]
],
"sentence_start": [
0,
1,
2,
9,
20,
24,
40,
66,
85,
105,
148,
189,
211,
214,
229,
237,
257,
290,
299,
311,
324,
356,
361,
365,
379,
391,
415,
436,
462,
468,
480,
511,
518,
532,
556,
579,
615,
621,
662,
676,
690,
718,
740,
782,
792,
802,
819,
866,
888,
897,
911,
917,
949,
957
],
"doc_key": "CNN_CF_20030303.1900.02"
}
@JunnYu 嗯嗯~
parse_ace_event.py处理完的train.json,形式是{}{}{},一个json object不是放在同一行,不同的json object之间也没有分隔符
作者的代码是一行一行去读取和解析的,所以会报错... 可能是这个代码不适用default-setting的情况?(没具体看parse_ace_event.py
下面是我改过的convert_examples.py
from os import path
import json
import collections
output_dir = "./data/ace-event/processed-data/json"
tmp_json_dir = "./data/ace-event/processed-data/default-settings/json"
for fold in ["train", "dev", "test"]:
f_convert = open(path.join(output_dir, fold + "_convert.json"), "w")
with open(path.join(tmp_json_dir, fold + ".json"), "r") as f:
json_str = ""
ed_char = "}"
for line in f.readlines():
line = line.strip()
json_str += line
if line == ed_char:
json_obj = json.loads(json_str)
json_str = ""
sentences = json_obj["sentences"]
ner = json_obj["ner"]
relations = json_obj["relations"]
events = json_obj["events"]
sentence_start = json_obj["sentence_start"]
doc_key = json_obj["doc_key"]
assert len(sentence_start) == len(ner) == len(relations) == len(events) == len(sentence_start)
for sentence, ner, relation, event, s_start in zip(sentences, ner, relations, events, sentence_start):
# sentence_annotated = dict()
sentence_annotated = collections.OrderedDict()
sentence_annotated["sentence"] = sentence
sentence_annotated["s_start"] = s_start
sentence_annotated["ner"] = ner
sentence_annotated["relation"] = relation
sentence_annotated["event"] = event
# if sentence_annotated["s_start"]>5:
f_convert.write(json.dumps(sentence_annotated, default=int) + "\n")
hi, xinya 我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’
我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:
我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图:
请问下这个是什么原因?