In the LCM data model, TsStrings can have multiple runs. Each run can be tagged with styles like bold or italic, or with language tags so that you can say "This part in the middle of the English sentence is actually Greek so it should be displayed with a Greek font". The way this is represented in LfMerge is by converting any runs that contain other writing systems to <span ws="xyz">text in the xyz language</span>. But what we didn't account for is when the writing system tag changes in FieldWorks.
For example, if we have <span ws="en">foo</span> in LfMerge, but the writing system tag changes to en-Latn-US in FieldWorks, the current code will look up the "en" writing system, find that it's not present (so the lookup returns the value 0), and try to insert text with a writing system of 0 into the FieldWorks TsString. Which is an invalid writing system ID, so FieldWorks throws an error saying "The specified writing system code is invalid" from the SIL.LCModel.Core.Text.TsPropsBldr.SetIntPropValues method.
What needs to happen is that LfMerge's SpanStrToTsString code needs to handle the case where GetWsFromStr returns 0, and try multiple different valid versions of the writing system. E.g. if en is not found, try en-Latn and then en-Latn-US (and also en-US), using the current data from langtags.json to figure out the correct default region(s) and script(s) to try. Then if none of those produce valid results, the SpanStrToTsString code should "punt" and go with the project's main writing system, which will more often than not be correct anyway. (Most of the time when this happens, it's because the data is in a Notes field that looks something like this: "An alternate spelling is <span ws="xyz">blahblah</span>, but this is rarely encountered except in books from a century ago").
In the LCM data model, TsStrings can have multiple runs. Each run can be tagged with styles like bold or italic, or with language tags so that you can say "This part in the middle of the English sentence is actually Greek so it should be displayed with a Greek font". The way this is represented in LfMerge is by converting any runs that contain other writing systems to
<span ws="xyz">text in the xyz language</span>
. But what we didn't account for is when the writing system tag changes in FieldWorks.For example, if we have
<span ws="en">foo</span>
in LfMerge, but the writing system tag changes toen-Latn-US
in FieldWorks, the current code will look up the "en" writing system, find that it's not present (so the lookup returns the value 0), and try to insert text with a writing system of 0 into the FieldWorks TsString. Which is an invalid writing system ID, so FieldWorks throws an error saying "The specified writing system code is invalid" from the SIL.LCModel.Core.Text.TsPropsBldr.SetIntPropValues method.What needs to happen is that LfMerge's
SpanStrToTsString
code needs to handle the case where GetWsFromStr returns 0, and try multiple different valid versions of the writing system. E.g. ifen
is not found, tryen-Latn
and thenen-Latn-US
(and alsoen-US
), using the current data from langtags.json to figure out the correct default region(s) and script(s) to try. Then if none of those produce valid results, the SpanStrToTsString code should "punt" and go with the project's main writing system, which will more often than not be correct anyway. (Most of the time when this happens, it's because the data is in a Notes field that looks something like this: "An alternate spelling is<span ws="xyz">blahblah</span>
, but this is rarely encountered except in books from a century ago").