usc-isi-i2 / rltk

Record Linkage ToolKit (Find and link entities)
MIT License
107 stars 23 forks source link

Qgram indexer #1

Closed Chinmay26 closed 7 years ago

Chinmay26 commented 7 years ago

Added the following

GreatYYX commented 7 years ago
  1. do we still need to include the license of crf-tokenizer and keep the setup.py file (for pip install)? because it's also a ISI project. @szeke
  2. remove main function and add test cases please.
  3. please handle file path problem in Core, that's why def q_gram_blocking(self, **kwargs): should be def q_gram_blocking(self, output_file_path, **kwargs): and then output_file_path = self._get_abs_path(output_file_path).
  4. add annotations to to q_gram_blocking in Core.
  5. please use 4 spaces as indent.
  6. remove all sys.path.append in indexer.py.
Chinmay26 commented 7 years ago

@GreatYYX Have fixed all issues except 1. I will add some test cases shortly. Please go ahead and review

GreatYYX commented 7 years ago

the format of utils.py is still not correct, I've fixed it.

something need to fix after merging:

  1. more detailed annotation in Core (not just kwargs).
  2. remove print in module to log.
  3. unified name of arguments, ex. file_iter1 or iter1.