srlearn / datasets

srlearn-compatible relational datasets
https://srlearn.github.io/relational-datasets/
MIT License
2 stars 0 forks source link

Inconsistencies for what is allowed in a dataset #5

Closed hayesall closed 3 years ago

hayesall commented 3 years ago

I usually assume examples and facts should look like this:

example(one).
example(one,two).

Multiple places in the data violate this:

Blank lines:

https://github.com/srlearn/datasets/blob/084197b2d50f2d8f5674d29867a634ff9fccbe71/srlearn/uwcse/uwcse/fold2/train/train_facts.txt#L1-L4

https://github.com/srlearn/datasets/blob/084197b2d50f2d8f5674d29867a634ff9fccbe71/srlearn/uwcse/uwcse/fold3/train/train_facts.txt#L731-L734

https://github.com/srlearn/datasets/blob/084197b2d50f2d8f5674d29867a634ff9fccbe71/srlearn/uwcse/uwcse/fold3/train/train_facts.txt#L1181-L1183

Furthermore, these should probably be normalized to eliminate spaces between commas and other inconsistencies.


SRLBoost and BoostSRL derivatives allow quite a few additional symbols in the grammar (including % comments and //- comments)


uwcse:

citeseer:

cora:

hayesall commented 3 years ago

Fixed in #10