transientskp / tkp

A transients-discovery pipeline for astronomical image-based surveys
http://docs.transientskp.org/
BSD 2-Clause "Simplified" License
19 stars 14 forks source link

Warning when running trap-manage.py #597

Closed AntoniaR closed 1 year ago

AntoniaR commented 1 year ago

When running trap-manage.py using the tkp branch converted_to_python3, I get this warning repeated 4 times. Not sure if it's a problem.

/home/antoniar/TraP_py3/tkp/tkp/db/quality.py:10: SAWarning: relationship 'Extractedsource.runningcatalogs' will copy column extractedsource.id to column assocxtrsource.xtrsrc, which conflicts with relationship(s): 'Assocxtrsource.xtrsrc' (copies extractedsource.id to assocxtrsource.xtrsrc). If this is not the intention, consider if these relationships should be linked with back_populates, or if viewonly=True should be applied to one or more if they are read-only. For the less common case that foreign key constraints are partially overlapping, the orm.foreign() annotation can be used to isolate the columns that should be written towards.   To silence this warning, add the parameter 'overlaps="xtrsrc"' to the 'Extractedsource.runningcatalogs' relationship. (Background on this error at: https://sqlalche.me/e/14/qzyx)
  'rms': Rejectreason(id=0, description='RMS invalid'),
HannoSpreeuw commented 1 year ago

Seems to be the same one as this one, from September 7.

AntoniaR commented 1 year ago

Ah! I had forgotten... I'll see if it impacts the results otherwise we can turn off the warning.

HannoSpreeuw commented 1 year ago

A number of backref statements are used in db/model.py. backref statements are considered legacy code:

The relationship.backref parameter is generally considered to be legacy; 
for modern applications, using explicit relationship() constructs linked together using 
the relationship.back_populates parameter should be preferred.

Perhaps I should try replacing those backref statements.

HannoSpreeuw commented 1 year ago

I have been trying to wrap my head around this. It is difficult to understand for somebody who is not proficient in setting up databases. However, the four warnings are all triggered by line 10 from quality.py:

'rms': Rejectreason(id=0, description='RMS invalid'),

and will disappear by adding viewonly=True as an argument to the Runningcatalog.extractedsources relationship in model.py:

    extractedsources = relationship('Extractedsource',
                                    secondary='assocxtrsource',
                                    backref='runningcatalogs',
                                    viewonly=True)

This seems not to break anything, all unit tests pass, but I am still not sure this is safe. @jdswinbank could you perhaps have a quick glance at this?

jdswinbank commented 1 year ago

Hi Hanno — I spent a few minutes looking at this earlier, but didn't really manage to understand what's happening. I don't have more time to dig in at the moment, but I'll try to come back and have another look in a week or two (you might need to remind me!!).

Based on the SQLAlchemy docs, though, I wonder if you want to do something with back_propagates rather than viewonly? Not sure...

HannoSpreeuw commented 1 year ago

Reading the documentation about the legacy backref parameter, the obvious way to fix this would be to change, at line 337 from model.py:

    extractedsources = relationship('Extractedsource',
                                    secondary='assocxtrsource',
                                    backref='runningcatalogs')

to

    extractedsources = relationship('Extractedsource',
                                    secondary='assocxtrsource',
                                    back_populates='runningcatalogs')

while adding

runningcatalogs = relationship("Runningcatalog", back_populates = "extractedsources")

to the Extractedsource class.

That, however, results in a sqlalchemy.exc.AmbiguousForeignKeysError:

Can't determine join between 'extractedsource' and 'runningcatalog'; 
tables have more than one foreign key constraint relationship between them. 
Please specify the 'onclause' of this join explicitly.
HannoSpreeuw commented 1 year ago

We know that this warning first appeared after Python 2 to 3 conversion. Most likely the workings of the relationships have not been changed by that conversion nor by a simultaneous SQLAlchemy upgrade.

The warning is about the column extractedsource.id being copied to assocxtrsource.xtrsrc by two different relationships: Extractedsource.runningcatalogs and Assocxtrsource.xtrsrc. Most likely this is a desired mechanism in TraP.

That means that viewonly=True is probably not the desired solution since it will change the inner workings (viewonly=False is the default setting) and silencing the warning using overlaps="xtrsrc" should be a better solution. This is also the suggested solution from ChatGPT, for what it's worth.

HannoSpreeuw commented 1 year ago

I just checked: Python 2 to 3 conversion had zero effect on model.py.

HannoSpreeuw commented 1 year ago

TraP_db_model

Perhaps this Entity Relationship diagram can shed some light on this issue.

Extracted using schemadisplay but possibly also available in the documentation.

HannoSpreeuw commented 1 year ago

The four warnings are about two columns (runningcatalog.id and extractedsource.id) addressed in assocxtrsource by ForeignKey functions:

class Assocxtrsource(Base):
    __tablename__ = 'assocxtrsource'
    __table_args__ = (
        Index('assocxtrsource_runcat_xtrsrc_key', 'runcat', 'xtrsrc',
              unique=True),
    )

    id = Column(Integer, primary_key=True)

    runcat_id = Column('runcat', ForeignKey('runningcatalog.id'), nullable=False)
    runcat = relationship('Runningcatalog')

    xtrsrc_id = Column('xtrsrc', ForeignKey('extractedsource.id'), index=True)
    xtrsrc = relationship('Extractedsource')
HannoSpreeuw commented 1 year ago

The schema above is indeed also part of the documentation.

Not sure if they are exactly equal, I was hoping to find a difference between intended and actual implementation, but I haven't found anything obvious yet.

HannoSpreeuw commented 1 year ago

Commenting out this line of code that creates the two relationships propagating into all four warnings, i.e.

    extractedsources = relationship('Extractedsource',
                                    secondary='assocxtrsource',
                                    backref='runningcatalogs')

results in the four warnings disappearing while the unit tests still pass.

HannoSpreeuw commented 1 year ago

This also makes the ER diagram meet the intended ER diagram, when checking the relationships involving the extractedsource and runningcatalog tables. TraP_db_model_line_337_removed_annotated .

jdswinbank commented 1 year ago

I've not had chance to actually understand the changes you've made above, but it looks like this was very nice detective work. Also, I wonder if it's possible to use schemadisplay to generate the diagram as part of the documentation build? That would be much less error-prone than relying on somebody remembering to commit a new schema.png when the schema is changed.

HannoSpreeuw commented 1 year ago

Yes, I could turn that into an issue of the feature request type.

AntoniaR commented 1 year ago

The fix is now merged.