Closed DurHHHI closed 6 years ago
Example above shows issues with added space in annotated text; also issues with character copy in header.
@peterwebster , @nomoregrapes @KPalmerHeathman
Peter to adjust later process to cater for i umlaut (some coming up in 17)
{now covered at #136
This might be a separate ticket, but too note: special characters - if searched for in normal plain text - cannot be found
Ex. if searching just 'Caroe', Caröe not found
[Now dealt with at #132 ]
Another example screenshot below where accents are not correctly appearing - this is when searching for Isabella Caroline Dennistoun Henson [pers112].
648 Caröe William Douglas
Added space: In the Person TSV there is a space in the narrative, William Douglas Carö e (1857-1938; ODNB), arc...
. I can fix it on the repository copy of the TSV but @DurHHHI should also fix it.
Display in the "tab"(title) of the page, and display on the search results: These now render the accents & special characters.
Searching without accents: this needs some transliteration, I'll leave the other issue/ticket for that.
If you are happy with the display now, can we close this ticket?
Could @DurHHHI have a root around and report back...
@peterwebster - just a heads up, that Katie and my version of persons.tsv has not modified to correct accents since the ingest and I'm not viewing an added space with ' Caröe ' when opening in excel or textedit. How can we catch these possible extra characters/spaces?
@peterwebster Rooted around the website though and see title accents reflecting correctly now - fab!
Closing this ticket now, but the answer to @DurHHHI is that, if there are any of these inserted in the TSV since you last fixed them, the rogue character is designed to be invisible,,, so it's a question of putting the cursor next to them and seeing whether there is a space there. Also, I don't know whether you're transferring new annotations at the minute, but if so it would be better to hang on until I've done #117 : let me know if I should prioritise that. ( cc @KPalmerHeathman )
@peterwebster Reopening (temporarily) as do not know where to place this comment. Still finding some instances where accented characters not appearing correctly. See tagged persons alongside Ella's annotation as example.
Passing this on to @nomoregrapes
Fixed.
Just spotted some odd characters in the organisation annotation for 'The Seven': Henson's clerical club in the St Alban's diocese, forerunner of the Brotherhood before Henson's translation to Westminster.ÊAs well as Henson, its members includedÊThomas Marsden, Richard Swallow, John Russell, John Pelham, William Dawson, and Henry Lake.
@nomoregrapes is this just an incidence of nonsense characters that have ended up in the TSV - and so just need deleted manually in the GUI? Or is there something else going on?
Yep, it seems to be in the TSV as its members includedÊThomas Marsden
so just needs to be deleted manually. I'll do that on the current website now.
Thanks @nomoregrapes : once you've done that, assign to @DurHHHI @KPalmerHeathman as it will need to be fixed in the master TSV file to avoid it being reingested later.
Gah how did this stuff get in the TSV? Just found quite a few more in the annotations online when looking around for other checks.
I've not been able to work it out, and annoyingly me and Peter are unable to perform a search across files to find them. (I just had another go at a Regex pattern)
It could be that the characters are some formatting message from Word/Windows that appear invisible, but when the text is decoded as UTF-8 they are considered to be these visible characters. If all our text editors/viewers are set to UTF-8 then they should be more likely to get spotted before ingest.
It would be easier if I could remove everything that isn't a-Z
, but they you wouldn't get all of Henson's greek letter usage, characters with accents, or his use of quotes etc.
If that is the case, do you think we could raise this at next week's meeting? Maybe omitting the accents and greek characters is the better way to go? This would also, I imagine, apply to the hundreds of accented 'nee's that have been used with married women.
On my list for next week is to finish ticket #113 which should help. These are usually the things that show up as little question marks in Excel, and have come across either from Word or from some online source and don't show in GDrive. To solve it, we'd need to be editing in something other than GDrive, which upheaval is probably too much to contemplate, at least any time soon. I don't think losing the accents and the Greek is likely to be acceptable. @DurHHHI @KPalmerHeathman @nomoregrapes
Adding this ticket now to the K&H return label, as another thing to fix once we have access to the editing. @DurHHHI @KPalmerHeathman
@KPalmerHeathman @DurHHHI I think this is now covered by #147 so closing this one down.