monarch-initiative / monarch-legacy

Monarch web application and API
BSD 3-Clause "New" or "Revised" License
42 stars 37 forks source link

html escaping problems #26

Closed nlwashington closed 10 years ago

nlwashington commented 10 years ago

Stick is non-uniformly escaping html, and causing all sorts of problems with genotypes and alleles during display. This will need to get tracked down and fixed somehow.

things that end up escaped are omim variation ids, like "ABC < OMIM:12345.0001 > ", but not ABC < rs1234566 > .

similarly, almost any mouse model shows up wrong...whatever is in between the < > doesn't want to show up.

nlwashington commented 10 years ago

Specific examples include:

http://localhost:8080/disease/OMIM_300048 here, the genotypes of the models under "Similar Models" do not show up correctly. For example, Ikbkap/Ikbkap; /0 [involves: 129S1/Sv * C57BL/6 * C57BL/6J * CBA]

should be Ikbkap< tm1Id >/Ikbkap< tm1.1Id >; < Tg(Hsp70-1-cre)6Arge >/0 [involves: 129S1/Sv * C57BL/6 * C57BL/6J * CBA]

even here in github, i had to add spaces next to the angle brackets to make it show up right.

nlwashington commented 10 years ago

I did some investigation into this, and tracked it down...

I tracked something down to here: ringojs/modules/ringo/jsgi/connector.js and what was passed into writeBody(response, body, charset) is fine, but what is written out using part.toByteString(charset) is not properly escaped. The few times it did work, it would throw in some escaping. i believe this calls stick.

that's as far as i've tracked it. not sure what to do next.

nlwashington commented 10 years ago

note that, the strange thing is that the "allele" that also sits on this page (http://localhost:8080/disease/OMIM_300048), looks okay! the stuff inside of the angle brackets shows up!!!

like FLNA< 2-BP DEL, 65AC >.

see, i don't understand it. so odd.

nlwashington commented 10 years ago

Here's yet another example: http://localhost:8080/disease/OMIM_312170

under the "alleles" section, some of them show up in the angle brackets, and some don't!!! compare what you see in the "mutation" column to the "allele" column... what is in the mutation column (after the first comma) is what you should see in the angle brackets. but the only ones that show up have spaces and/or non-word chars.

pnrobinson commented 10 years ago

I cannot find that alele on the corresponding OMIM page, http://omim.org/entry/300048

note that there ahve been many different nomenclatures for mutations in human genetics over the years, and this one is no longer valid. It should probably be c.65delAC

See http://www.hgvs.org/mutnomen/

There have been groups that have tried to parse all of this stuff automagically, that would actually be quite valuable, but the mutations do need to be translated to HGVS coordinates to avoid (lots of) confusion)

-Peter

Dr. med. Peter N. Robinson, MSc. Professor of Medical Genomics Professor in the Bioinformatics Division of the Department of Mathematics and Computer Science of the Freie Universität Berlin Institut für Medizinische Genetik und Humangenetik Charité - Universitätsmedizin Berlin Augustenburger Platz 1 13353 Berlin Germany +4930 450566006 Mobile: 0160 93769872 peter.robinson@charite.de http://compbio.charite.de http://www.human-phenotype-ontology.org Introduction to Bio-Ontologies: http://www.crcpress.com/product/isbn/9781439836651 I have learned from my mistakes, and I am sure I can repeat them exactly ORCID ID:http://orcid.org/0000-0002-0736-9199 Scopus Author ID 7403719646 Appointment request: http://doodle.com/pnrobinson


Von: Nicole Washington [notifications@github.com] Gesendet: Donnerstag, 19. Dezember 2013 06:04 An: monarch-initiative/monarch-app Betreff: Re: [monarch-app] html escaping problems (#26)

note that, the strange thing is that the "allele" that also sits on this page (http://localhost:8080/disease/OMIM_300048), looks okay! the stuff inside of the angle brackets shows up!!!

like FLNA< 2-BP DEL, 65AC >.

see, i don't understand it. so odd.

— Reply to this email directly or view it on GitHubhttps://github.com/monarch-initiative/monarch-app/issues/26#issuecomment-30905485.

nlwashington commented 10 years ago

peter - that's because the allele is actually on the gene page at omim, not on the phenotype page (which is what you specified). you can see it here: http://omim.org/entry/300017; allele number 300017.0025 of course, this is just a first pass... i'm using the OMIM notation for the variation here. we definitely want to use HGVS notation; but that is not going to be accomplished by from parsing OMIM. once i marry this to the ENSEMBL variation db, i'll get the HGVS for free. it's coming, just not today.

nlwashington commented 10 years ago

Also, peter, I'm going to make the HGVS notation into a couple of separate tickets. (Do you want access to that?) as a data issue: https://support.crbs.ucsd.edu/browse/LAMHDI-269 as a display issue: https://github.com/monarch-initiative/monarch-app/issues/27