peterwebster / henson

Master data store for the Hensley Henson Journals project, and issue tracker. The application code is kept elsewhere.
1 stars 1 forks source link

Special characters displaying incorrectly in annotation texts and names #130

Closed DurHHHI closed 6 years ago

DurHHHI commented 6 years ago
screen shot 2018-07-02 at 3 00 36 pm
DurHHHI commented 6 years ago

Example above shows issues with added space in annotated text; also issues with character copy in header.

@peterwebster , @nomoregrapes @KPalmerHeathman

peterwebster commented 6 years ago

Peter to adjust later process to cater for i umlaut (some coming up in 17)

{now covered at #136

DurHHHI commented 6 years ago

This might be a separate ticket, but too note: special characters - if searched for in normal plain text - cannot be found

Ex. if searching just 'Caroe', Caröe not found

[Now dealt with at #132 ]

DurHHHI commented 6 years ago

Another example screenshot below where accents are not correctly appearing - this is when searching for Isabella Caroline Dennistoun Henson [pers112].

screen shot 2018-07-02 at 3 18 38 pm
nomoregrapes commented 6 years ago

648 Caröe William Douglas

Added space: In the Person TSV there is a space in the narrative, William Douglas Carö e (1857-1938; ODNB), arc.... I can fix it on the repository copy of the TSV but @DurHHHI should also fix it.

Display in the "tab"(title) of the page, and display on the search results: These now render the accents & special characters.

Searching without accents: this needs some transliteration, I'll leave the other issue/ticket for that.

If you are happy with the display now, can we close this ticket?

peterwebster commented 6 years ago

Could @DurHHHI have a root around and report back...

DurHHHI commented 6 years ago

@peterwebster - just a heads up, that Katie and my version of persons.tsv has not modified to correct accents since the ingest and I'm not viewing an added space with ' Caröe ' when opening in excel or textedit. How can we catch these possible extra characters/spaces?

screen shot 2018-07-11 at 1 43 06 pm
DurHHHI commented 6 years ago

@peterwebster Rooted around the website though and see title accents reflecting correctly now - fab!

peterwebster commented 6 years ago

Closing this ticket now, but the answer to @DurHHHI is that, if there are any of these inserted in the TSV since you last fixed them, the rogue character is designed to be invisible,,, so it's a question of putting the cursor next to them and seeing whether there is a space there. Also, I don't know whether you're transferring new annotations at the minute, but if so it would be better to hang on until I've done #117 : let me know if I should prioritise that. ( cc @KPalmerHeathman )

DurHHHI commented 6 years ago

@peterwebster Reopening (temporarily) as do not know where to place this comment. Still finding some instances where accented characters not appearing correctly. See tagged persons alongside Ella's annotation as example.

screen shot 2018-07-20 at 12 50 51 pm
peterwebster commented 6 years ago

Passing this on to @nomoregrapes

nomoregrapes commented 6 years ago

Fixed.

KPalmerHeathman commented 6 years ago

Just spotted some odd characters in the organisation annotation for 'The Seven': Henson's clerical club in the St Alban's diocese, forerunner of the Brotherhood before Henson's translation to Westminster.ÊAs well as Henson, its members includedÊThomas Marsden, Richard Swallow, John Russell, John Pelham, William Dawson, and Henry Lake.

peterwebster commented 6 years ago

@nomoregrapes is this just an incidence of nonsense characters that have ended up in the TSV - and so just need deleted manually in the GUI? Or is there something else going on?

nomoregrapes commented 6 years ago

org4

Yep, it seems to be in the TSV as its members includedÊThomas Marsden so just needs to be deleted manually. I'll do that on the current website now.

peterwebster commented 6 years ago

Thanks @nomoregrapes : once you've done that, assign to @DurHHHI @KPalmerHeathman as it will need to be fixed in the master TSV file to avoid it being reingested later.

KPalmerHeathman commented 6 years ago

Gah how did this stuff get in the TSV? Just found quite a few more in the annotations online when looking around for other checks.

nomoregrapes commented 6 years ago

I've not been able to work it out, and annoyingly me and Peter are unable to perform a search across files to find them. (I just had another go at a Regex pattern)

It could be that the characters are some formatting message from Word/Windows that appear invisible, but when the text is decoded as UTF-8 they are considered to be these visible characters. If all our text editors/viewers are set to UTF-8 then they should be more likely to get spotted before ingest.

nomoregrapes commented 6 years ago

It would be easier if I could remove everything that isn't a-Z, but they you wouldn't get all of Henson's greek letter usage, characters with accents, or his use of quotes etc.

DurHHHI commented 6 years ago

If that is the case, do you think we could raise this at next week's meeting? Maybe omitting the accents and greek characters is the better way to go? This would also, I imagine, apply to the hundreds of accented 'nee's that have been used with married women.

peterwebster commented 6 years ago

On my list for next week is to finish ticket #113 which should help. These are usually the things that show up as little question marks in Excel, and have come across either from Word or from some online source and don't show in GDrive. To solve it, we'd need to be editing in something other than GDrive, which upheaval is probably too much to contemplate, at least any time soon. I don't think losing the accents and the Greek is likely to be acceptable. @DurHHHI @KPalmerHeathman @nomoregrapes

peterwebster commented 6 years ago

Adding this ticket now to the K&H return label, as another thing to fix once we have access to the editing. @DurHHHI @KPalmerHeathman

peterwebster commented 6 years ago

@KPalmerHeathman @DurHHHI I think this is now covered by #147 so closing this one down.