monarch-initiative / vertebrate-breed-ontology

https://monarch-initiative.github.io/vertebrate-breed-ontology/
8 stars 0 forks source link

Update names with special character #9

Closed sabrinatoro closed 2 years ago

sabrinatoro commented 2 years ago

Note: protege will accept the spelling containing accents,...

sabrinatoro commented 2 years ago

The query is in the src/sparql/reports folder

sabrinatoro commented 2 years ago

The list of term labels and synonyms with special characters is here There are 2 sheets:

@franknic @ImkeTammen : Please review these 2 lists (the label one is more important than the synonym one), and add the correct spelling in the spreadsheet (note that if it looks ok in the spreadsheet, it will look ok in protege). If no change should be done, please add a note. Please let me know when you are done, I will update the labels in VBO.

Note that this is not super urgent, therefore you can take your time to review these lists. Thanks!

franknic commented 2 years ago

Hi Sabrina Today I started working through the weird character file, following on from Imke's much-appreciated efforts. It's a very slow business, because just about every one has to be checked against the online version of DADIS. At my current rate, it could take me 40 hours to finish the list! I am, therefore, wondering if we can automate this process a bit. To this end, I am assembling a table equating the weird characters to the correct characters, and there does seem to be a one-to-one correspondence. My table so far is given below. Is is feasible to consider using a table like this to create the correct words? There would be a bit of a risk in going down this route, but the alternative (doing it manually) is quite daunting. I am happy to do the extra work required to create a complete table.

’ | ‘ Å¡ | š é | é Å‚ | ł Ã¥ | å ß | ß Ã§ | ç ö | ö ć | æ ü | ü Å» | Ż ź | ź Å„ | ń ñ | ñ à | à ı | ı ä | ä è | è Ž | Ž Ñ | Ñ Ãº | ú á | á Åš | Ś

ImkeTammen commented 2 years ago

Dear Frank,

Yes once I had started I thought that copying the column and doing a find and replace would have been a more straightforward option …..

Imke

From: franknic @.> Sent: Monday, 14 March 2022 10:03 PM To: monarch-initiative/vertebrate-breed-ontology @.> Cc: Imke Tammen @.>; Assign @.> Subject: Re: [monarch-initiative/vertebrate-breed-ontology] Update names with special character (Issue #9)

Hi Sabrina Today I started working through the weird character file, following on from Imke's much-appreciated efforts. It's a very slow business, because just about every one has to be checked against the online version of DADIS. At my current rate, it could take me 40 hours to finish the list! I am, therefore, wondering if we can automate this process a bit. To this end, I am assembling a table equating the weird characters to the correct characters, and there does seem to be a one-to-one correspondence. My table so far is given below. Is is feasible to consider using a table like this to create the correct words? There would be a bit of a risk in going down this route, but the alternative (doing it manually) is quite daunting. I am happy to do the extra work required to create a complete table.

’ | ‘ Å¡ | š é | é Å‚ | ł Ã¥ | å ß | ß Ã§ | ç ö | ö ć | æ ü | ü Å» | Ż ź | ź Å„ | ń ñ | ñ à | à ı | ı ä | ä è | è Ž | Ž Ñ | Ñ Ãº | ú á | á Åš | Ś

— Reply to this email directly, view it on GitHubhttps://github.com/monarch-initiative/vertebrate-breed-ontology/issues/9#issuecomment-1066646601, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AS4O6K32O7L3RHTOZRSLZTDU74MFLANCNFSM5PIN5V7A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were assigned.Message ID: @.**@.>>

franknic commented 2 years ago

Thanks, Imke Yes, that does seem to be the best strategy. I’ll continue to work through the page I’m on, until I think I’ve got all the characters, and then I will have a go at some global finding and replacing in text copied into a local Excel file.

For an up-to-date list of animal traits/disorders characterised at the DNA level, and for tables of likely causal variants, visit Online Mendelian Inheritance in Animals (OMIA): https://omia.orghttp://omia.angis.org.au/ OMIA celebrated its 25th birthday on 26 May 2020: to share in the virtual celebration, click herehttps://www.sydney.edu.au/science/news-and-events/2020/05/25/online-mendelian-inheritance-animals.html. To help retain OMIA as a freely-available resource, please make a donation at https://omia.org/donate/ To join the OMIA Support Group, register at https://www.animalgenome.org/community/omia-support/

From: ImkeTammen @.> Sent: Tuesday, 15 March 2022 1:00 PM To: monarch-initiative/vertebrate-breed-ontology @.> Cc: Frank Nicholas @.>; Assign @.> Subject: Re: [monarch-initiative/vertebrate-breed-ontology] Update names with special character (Issue #9)

Dear Frank,

Yes once I had started I thought that copying the column and doing a find and replace would have been a more straightforward option …..

Imke

From: franknic @.<mailto:@.>> Sent: Monday, 14 March 2022 10:03 PM To: monarch-initiative/vertebrate-breed-ontology @.<mailto:@.>> Cc: Imke Tammen @.<mailto:@.>>; Assign @.<mailto:@.>> Subject: Re: [monarch-initiative/vertebrate-breed-ontology] Update names with special character (Issue #9)

Hi Sabrina Today I started working through the weird character file, following on from Imke's much-appreciated efforts. It's a very slow business, because just about every one has to be checked against the online version of DADIS. At my current rate, it could take me 40 hours to finish the list! I am, therefore, wondering if we can automate this process a bit. To this end, I am assembling a table equating the weird characters to the correct characters, and there does seem to be a one-to-one correspondence. My table so far is given below. Is is feasible to consider using a table like this to create the correct words? There would be a bit of a risk in going down this route, but the alternative (doing it manually) is quite daunting. I am happy to do the extra work required to create a complete table.

’ | ‘ Å¡ | š é | é Å‚ | ł Ã¥ | å ß | ß Ã§ | ç ö | ö ć | æ ü | ü Å» | Ż ź | ź Å„ | ń ñ | ñ à | à ı | ı ä | ä è | è Ž | Ž Ñ | Ñ Ãº | ú á | á Åš | Ś

— Reply to this email directly, view it on GitHubhttps://github.com/monarch-initiative/vertebrate-breed-ontology/issues/9#issuecomment-1066646601, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AS4O6K32O7L3RHTOZRSLZTDU74MFLANCNFSM5PIN5V7A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were assigned.Message ID: @.**@.mailto:***@***.******@***.***>>

— Reply to this email directly, view it on GitHubhttps://github.com/monarch-initiative/vertebrate-breed-ontology/issues/9#issuecomment-1067482097, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AS4O6JEU35CCKMAMGGBUNRLU77VLJANCNFSM5PIN5V7A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were assigned.Message ID: @.**@.>>

sabrinatoro commented 2 years ago

@franknic and @ImkeTammen : thank you for doing this work. I agree, I think the best strategy at this point is to "find and replace" within the document. Also please remember that there is no urgency on this (the records in VBO are still there and available), and it will take whatever time is needed.

franknic commented 2 years ago

Greetings, all I've had another session on the first page this afternoon, and the find and replace strategy worked very well in a trial run up to row 100. I'm now prepared to do the same for all the remaining rows next chance I get. And then I should be able to do the same for the other sheet.

franknic commented 2 years ago

I've just finished the first sheet, and should be able to do the second sheet in far less time, using the conversion table I have created. It is likely that there will be some mistakes, but we can correct those as we spot them. Thanks again to Imke for all your contributions. One issue that has arisen in communication with DADIS: the transboundary name "Green-Legged partridge" should be deleted; for any breed-country that has this transboundary name as its parent, the new parent should be species = chicken. In other words, the breed-countries with this transboundary name are actually local breeds, occurring in only one country.

franknic commented 2 years ago

I should have added that, in the interest of time, I have not included language for most entries, as Imke has done. Given that I have prepared most of the entries via a find and replace strategy, entering language would have been an additional lengthy task. If there is any doubt about the language for a particular entry, I should be able to resolve it.

franknic commented 2 years ago

Back again, having finished both sheets! For the second sheet, I have populated the column headed "updated label (with correct spelling)". I've also created a new heading "updated ?exactsyn" which now contains the correct spelling for all entries in the original "?exactsyn" column. As with the other sheet, I have not given every language. In this case the only languages mentioned are those using different scripts, namely Bulgarian, Korean and Chinese. So, these two sheets are now ready to go! Thanks again to Imke for doing all the German entries.

sabrinatoro commented 2 years ago

Note to self: I updated the breed and transboundary labels with the correct (without weird characters) spelling. Note that some terms were missed in the report, probably because the report was done before we solved the issue of "missing ID" due to some format issues in some of the comments.

franknic commented 2 years ago

Thanks, Sabrina Are there any parts of that third sheet that need my attention? In relation to dogs and cats, Imke and I have now received legal advice on the copyright issues, and it looks as if we may well be able to start compiling those entries next week.

sabrinatoro commented 2 years ago

Thank you @franknic ! At this point, nothing needs your attention: I need to figure out a few things first, including how to update the ontology with the current list of correctly spelled synonyms. I will tag you here and let you know when I have more work for you. Thank you for being so attentive and responsive!

sabrinatoro commented 2 years ago

The first round of revision is complete. I will close this issue and create a new one to review the labels that we missed.

franknic commented 2 years ago

Thank you, Sabrina. That's great progress!