snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

candidate_subclass fails with reserved words #914

Closed jnu closed 5 years ago

jnu commented 6 years ago

Two minor issues. They're related, and I only noticed the second due to the first.

First: candidate_subclass fails when id is given as a field. I suppose there may be other reserved words that are problematic here.

Minimal repro:

C = candidate_subclass('C', ['id'])

Error:

/anaconda3/lib/python3.6/site-packages/sqlalchemy/sql/selectable.py in _join_condition(cls, a, b, ignore_nonexistent_tables, a_subset, consider_as_foreign_keys)
    977                 "Can't find any foreign key relationships "
    978                 "between '%s' and '%s'.%s" %
--> 979                 (a.description, b.description, hint))
    980 
    981         crit = [(x == y) for x, y in list(constraints.values())[0]]

NoForeignKeysError: Can't find any foreign key relationships between 'candidate' and 'c'.

Second: rerunning the command above after it fails once gives a different error, about the table already existing (despite the fact that the subclass command ultimately failed). I'd expect the subclass operation to be transactional, and to roll back if there's an issue.

Minimal repro:

C = candidate_subclass('C', ['id'])
C = candidate_subclass('C', ['id'])

Error:

/Users/jnu/anaconda3/lib/python3.6/site-packages/sqlalchemy/ext/declarative/clsregistry.py:120: SAWarning: This declarative base already contains a class with the same class name and module name as snorkel.models.candidate.C, and will be replaced in the string-lookup table.
  item.__name__
nitya-yekkirala commented 6 years ago

how can we update or modify the candidate_subclass once it is created ? When i change the arguments and try to rerun it, it is throwing the below error.

ValueError: Candidate subclass Age already exists in memory with incompatible specification

pidugusundeep commented 6 years ago

when you are creating a candidate it will create a candidate named table in the database, if want to overwrite it will not allow you to do so i think you need to create a new candidate name or remove the existing db file and rerun the code again for a new candidate.

@ajratner am i correct ?? or is there any other alternative to do this ??

ajratner commented 5 years ago

Hi @jnu thanks for highlighting this and sorry for the delayed response! Yes that's a good point that id is a reserved column name in the DB (as in most).

And @nitya-yekkirala @pidugusundeep yes we don't have any helper methods in Snorkel right now for changing the table in the DB corresponding to the candidate subclass you've created. The benefit of using a DB backend is that you can just go in and change/remove the table yourself using SQL! Or, just dump the DB and start over :) . Closing for now, but feel free to re-open!