griffithlab / civicpy

A python interface for the CIViC db application
MIT License
8 stars 5 forks source link

therapy names aren't unique per Evidence, and Disease property field sometimes is not there #153

Open abidalikashi opened 4 days ago

abidalikashi commented 4 days ago

I was going to open a PR with this fix but seems like its private access only so my fix was adding those two properties to the evidence class this maybe wrong approach I'd like to know if it is, I am not an expert on the data itself

@property
def get_unique_therapies(self):
    unique_therapy = set()
    for therapy in self.therapies:
        if therapy not in unique_therapy:
            unique_therapy.add(therapy.name)
    return list(unique_therapy)

@property
def get_disease_name(self):
    return getattr(self.disease, 'name', 'N/A')
susannasiebert commented 1 day ago

That is certainly unexpected. I contacted the curation team to see if there is a reason why these particular evidence items would be linked to the same therapy twice. Hopefully we can get this data cleaned up.

Regarding the disease name: that behavior is expected because functional evidence is not linked to a specific disease. I'm not sure whether or not to add the proposed property. On the one hand it's certainly a helpful shortcut but on the other hand it might cause confusion since the disease name is not literally N/A for functional evidence. I would be supportive if that property returned None instead of `N/A' if no disease was associated with the evidence item but at that point I'm not sure how much more helpful this method would be.

abidalikashi commented 1 day ago

thank you for the clarification, I am not well versed in the data yet however I see your point about None, its a better approach, the reason I needed it to be that way is that it simplifies iterative operations where I have to account for disease name property not to exist at all, it would be much cleaner code if I just get None instead of the entire field missing and causing iteration to break.