Closed sacdallago closed 3 years ago
Note to self: data to reproduce
POST /annotations on swagger (https://api.bioembeddings.com/api/) with:
{"model":"seqvec","sequence":"MDKFWWHTAWGLCLLQLSLAHQQIDLNVTCRYAGVFHVEKNGRYSISRTEAADLCQAFNSTLPTMDQMKLALSKGFETCRYGFIEGNVVIPRIHPNAICAANHTGVYILVTSNTSHYDTYCFNASAPPEEDCTSVTDLPNSFDGPVTITIVNRDGTRYSKKGEYRTHQEDIDASNIIDDDVSSGSTIEKSTPEGYILHTYLPTEQPTGDQDDSFFIRSTLATIASTVHSKSHAAAQKQNNWIWSWFGNSQSTTQTQEPTTSATTALMTTPETPPKRQEAQNWFSWLFQPSESKSHLHTTTKMPGTESNTNPTGWEPNEENEDETDTYPSFSGSGIDDDEDFISSTIATTPRVSARTEDNQDWTQWKPNHSNPEVLLQTTTRMADIDRISTSAHGENWTPEPQPPFNNHEYQDEEETPHATSTTPNSTAEAAATQQETWFQNGWQGKNPPTPSEDSHVTEGTTASAHNNHPSQRITTQSQEDVSWTDFFDPISHPMGQGHQTESKDTDSSHSTTLQPTAAPNTHLVEDLNRTGPLSVTTPQSHSQNFSTLHGEPEEDENYPTTSILPSSTKSSAKDARRGGSLPTDTTTSVEGYTFQYPDTMENGTLFPVTPAKTEVFGETEVTLATDSNVNVDGSLPGDRDSSKDSRGSSRTVTHGSELAGHSSANQDSGVTTTSGPMRRPQIPEWLIILASLLALALILAVCIAVNSRRRCGQKKKLVINGGNGTVEDRKPSELNGEASKSQEMVHLVNKEPSETPDQCMTADETRNLQSVDMKIGV","format":"full"}
On webserver: docker logs --follow bio_embeddings_webserver
Related:
Currently, although below the seuqence length (2k), some sequences aren't processed by the webserver because mongo refuses to store their embeddings (especially the case with seqvec, since Lx1024x3):
Researching this issue, I found this answer: https://stackoverflow.com/a/4667728
Especially worrying, to me, is the idea that when the cache is quried, it's entirely copied to RAM (did I get that right? Is that really so?! If so: we should definitely move away from BSON and rather move to another gridfs storage -- should be straightforward, just StreamIO the data)