15 variant classification -> refactoring tables from public to phenopolis schema

alanwilter commented 3 years ago

OPENED FOR REVIEWS (and hopefully to be merged)

~unless we clean the following 3 issues:~

1) - [x] on page http://localhost:8888/manage_patient

Table public.individuals is now split in phenopolis.individual, phenopolis.individual_feature and phenopolis.individual_gene, besides attribute public.individuals.internal_id is replaced by phenopolis.individual.phenopolis_id, public.individuals.pi is dropped, features attributes require join to hpo.term (and others in hpo schema) and genes require join to ensembl.gene.

After my modifications, api/individual backend is returning a shorter json, like:

{"consanguinity": "unknown",  "external_id": "WebsterURMD_Sample_SK13768",  "id": 8268, "phenopolis_id": "PH00008268", "sex": "F"}

I'm currently trying to solve that by refactoring

def _query_all_individuals(db_session, additional_filter=None):
...

2) - [x] still on http://localhost:8888/manage_patient: Likely due to (1), buttons (functions) update and delete are hence not working. Note: in their respective tooltips I think it should be changed the word User for Patient (frontend).

Update:

Delete:

3) - [x] on page http://localhost:8888/create_patient Note: there's a typo there: Unknow should be Unknown For the reasons shown in (1), this will also fail to create a new Patient:

{"success":false,"error":{"type":"TypeError","message":["'observed_features' is an invalid keyword argument for Individual"]},"remote_addr":"127.0.0.1","full_path":"\/individual?","method":"POST","scheme":"http","timestamp":"[2021-Jan-18 13:15]"}

from log:

2021-01-18 13:15:53,559-INFO-sqlalchemy.engine.base.Engine::log|110:: {}
2021-01-18 13:15:53,586-DEBUG-views::helpers|13:: [{'external_id': 'sample test', 'sex': 'M', 'observed_features': 'HP:0012393', 'unobserved_features': '', 'pi': 'Admin', 'consanguinity': 'yes', 'simplified_observed_features_names': 'Allergy', 'simplified_observed_features': 'HP:0012393', 'genes': '', 'ancestor_observed_features': 'HP:0000001', 'ancestor_observed_features_names': 'All'}]
2021-01-18 13:15:53,628-ERROR-views::general|43:: 127.0.0.1 POST http /individual? 5xx INTERNAL SERVER ERROR
2021-01-18 13:15:53,629-ERROR-views::general|48:: 'observed_features' is an invalid keyword argument for Individual

This case is particularly tricky for me to fix because of the validation of this json payload that's all done via SqlAlchemy machinery in:

def _parse_payload(payload, model_class):
    if isinstance(payload, dict):
        objects = [model_class(**payload)]
    elif isinstance(payload, list):
        objects = [model_class(**p) for p in payload]
    else:
        raise PhenopolisException("Payload of unexpected type: {}".format(type(payload)), 400)
    return objects

Where model_class is Individual and hence, it lacks the features attributes. I don't know any quick or simple solution for this case.

Nevertheless, I can still create a patient if using only supported attributes. e.g.:

curl -c cookies.txt --header "Content-Type: application/json" \
  --request POST \
  --data '{"user": "Admin","password": "admin123"}' \
  http://localhost:5000/login

curl -b cookies.txt --header "Content-Type: application/json" \
  --request POST \
  --data '{"consanguinity": "no","external_id": "another sample test","sex": "F"}' \
  http://localhost:5000/individual

NOTE: see this example from our collection in Postman.

netlify[bot] commented 3 years ago

Deploy preview for phenopolis-dev canceled.

Built with commit 9f30010eb62da21f85a08074ba1ad75c80b11219

https://app.netlify.com/sites/phenopolis-dev/deploys/60157bad83abdb0007c5a849

alanwilter commented 3 years ago

I'm currently trying to solve that by refactoring def _query_all_individuals(db_session, additional_filter=None): ...

The particular hell I have here is either I convert the new schema to SQLAchemy or I go plein sql queries.

pontikos commented 3 years ago

I think fine to implement it using plain SQL ?

alanwilter commented 3 years ago

I've kept sqlalchemy so far and I was able to wire the new gene table (ensembl.gene) and via individual_gene so topic (1) is partially addressed now. I will sort features the same way, hopefully, by tomorrow.

alanwilter commented 3 years ago

I got to item (1) ticked so, expect for pi, which frontend should be updated as it's removed, all attributes are there now. I used 2 approaches, for genes, I used SQLAlchemy, for features, I used plein sql queries and python functions. Both have pros and cons.

alanwilter commented 3 years ago

Item (2) is now fixed as well but it raises a question: the way it is now we are mixing old with new schema for genes and hpo. I.e. autocomplete still uses old tables while dealing with individuals I'm using ensembl.gene and hpo.term via their individual.[gene,feature] tables. I will check if refactoring would be complicated.

alanwilter commented 3 years ago

Refactoring Gene would be very disruptive right now. I would need several tables like gene_synonym, cadd schema etc. I'll try to address topic 3 and if succeeded we start reviewing aiming to merge it.

alanwilter commented 3 years ago

AFAIK it should be OK now. It's quite a good deal of changes and may drive you crazy during review but any suggestions are welcome. I tried to use SQLAlchemy, but there are things there that I don't understand why they do not work (see comments in model.py for instance).

It would be good if others could test the branch on their own computers and, @YuanTian1991, you could check interface around individuals.

As I mentioned before, this is still mixing old and new schema.

pontikos commented 3 years ago

hey @alanwilter what is the situation with this PR? Are you able to merge? If you have issues with SQLAlchemy then please stick to regular SQL. I don't really mind as long as it works. Would it be helpful for me to go over this with you at some point with @dvarrazzo ?

alanwilter commented 3 years ago

@pontikos, I'm reviewing by checking and improving coverage and pytest. I'll ask @dvarrazzo to review at least the DB part.

pontikos commented 3 years ago

Thanks @alanwilter and @dvarrazzo , @alanwilter can you go ahead with the merge?

alanwilter commented 3 years ago

I didn't intended it to be huge, it was just snowballing when I realised all the minimum changes we'd need to keep it working while transitioning to the new schema. I just hope I got it basically right. I'm going to merge it.

phenopolis / phenopolis_genomics_browser

15 variant classification -> refactoring tables from public to phenopolis schema #309