siriuz / hdome

Other
0 stars 0 forks source link

Protein searches are returning incorrect results #4

Open siriuz opened 8 years ago

siriuz commented 8 years ago

Here's the steps to reproduce errors in searching:

Home>Query Database>Search By Protein>

Type in 'calreticulin' in the search box and search. Select the "Calreticulin OS=Mus musculus GN=Calr PE=1 SV=1" entry at the top (second entry down). I just chose this one at random here.

This opens up a page showing all the peptide sequences that have been identified from our experiments that derive from the protein calreticulin. Except it doesn't. It appears (I haven't checked all yet) that only the first peptide VIFNYKGKNV is from calreticulin. The rest are all non-calreticulin sequences. I just searched (externally) the second sequence listed, HCPLCAKSFT, and it apparently comes from a zinc finger protein. If I input HCPLCAKSFT into the "search by peptide sequence" on haplodome, it only comes back, erroneously, with calreticulin.

I tried another protein. Calnexin. Take the last hit (Calnexin OS=Homo sapiens GN=CANX PE=1 SV=2) and I can't see that any of the resulting peptides are correct, from at least the first four or five. To check yourself, if you click on the "Uniprot ID" to the left of the peptide sequence entry, it'll have a link to the protein sequence database. E.g. for calnexin that's P27824 (here: http://www.uniprot.org/uniprot/P27824) and halfway down that page is the full amino acid sequence of the protein.

siriuz commented 8 years ago

Localised the cause of faulty results to be incorrect uploading algorithm - protein names seem to be misassigned. Preview is correct, however.

Edit: Confirmed with Nathan, the uploading algorithm is mangling the protein names. Possibly a good idea to rewrite the whole uploading algorithm as opposed to fixing it because it currently uses raw SQL with individual commits rather than Django's bulk_create to roll up all the commits into one transaction.

siriuz commented 8 years ago

http://django-postgres-copy.californiacivicdata.org/en/latest/