monarch-initiative / phenopacket-store

Collections of GA4GH phenopackets that represent individuals with Mendelian diseases.
https://monarch-initiative.github.io/phenopacket-store/
BSD 3-Clause "New" or "Revised" License
14 stars 4 forks source link

Error in ppkt #146

Closed leokim-l closed 3 weeks ago

leokim-l commented 3 weeks ago

@pnrobinson In PMID_30095615_proband.json there are some \t characters that are interpreted as tabs

https://github.com/monarch-initiative/phenopacket-store/blob/2c58388a9312092df2c7f3b71b74aaac60ac7b1a/notebooks/CRX/phenopackets/PMID_30095615_proband.json#L155

leading to extra-tab errors in MALCO when we have the disease label as one entry of a dataframe:

https://github.com/monarch-initiative/malco/blob/5c3ca68ef0bb12da2bf3cccbdaf9a3500a293b39/src/malco/post_process/ranking_utils.py#L66

pnrobinson commented 3 weeks ago

Fixed this error, it will be repaired with the next release @ielis Would be it easy to add a Q/C check with the CI that throws an error if there is a tab in any of the components of the phenopackets? This one came from the way I am constructing the Excel template, but we should rule out tabs especially in the disease name and in the PubMed titles?

ielis commented 3 weeks ago

Yeah, will implement here.

ielis commented 3 weeks ago

Done in here, will be included in Phenopacket Store CI from the next release.

leokim-l commented 1 week ago

I think I just caught another one, is this fixed in the next release?

https://github.com/monarch-initiative/phenopacket-store/blob/8e69b52edd72a700e9b07a6de91cda6b0e8821e3/notebooks/CAD/phenopackets/PMID_25678555_UDP4003.json#L223

pnrobinson commented 1 week ago

Fixed, also added unit test to prevent this in the future.