usc-isi-i2 / table-linker

Table Linker
MIT License
21 stars 8 forks source link

Issues regard to "ground-truth-labeler" #4

Open mtang724 opened 4 years ago

mtang724 commented 4 years ago

There are three issues regard to "ground-truth-labeler"

  1. Cannot load large file as ground truth file -> will raise an error "pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file."
  2. There are some inconsistencies between doc and actual command -> the actual command won't take "-c" as an argument

image

  1. The actual command requires the ground truth file to have a column named "kg_label", but in the doc, the sample file does not have this column image
mtang724 commented 4 years ago

I found a solution for issue 1, just add lineterminator='\n' to correspondingpd.read_csv command

mtang724 commented 4 years ago

groud_truth_file -> t2dv2_gt_sample.csv input_file -> output_sample.csv files link: https://drive.google.com/file/d/1FLbh7CCYqGak4pxjHpFcFueZSa_T6pqr/view?usp=sharing