Closed opoudjis closed 3 years ago
@opoudjis relaton uses the Date
class to format dates before storing them in a bibitem.
For example, a scraped date could be a string "February 2012" so relaton converts it to the string "2012-02" using Date#strptime
and Date#strftime
, and stores it in a bibitem.
There are other methods that use Date
to parse and format string stored in a bibitem.
The Date#strptime
, Date#strftime
, and Date#parse
use dash \u002d
as a delimiter. So dates in relaton's bibitem stored with single dash \u002d
without surrounding spaces.
As I understand we need to render XML by replacing doubled dashes --
and --
with \u2013
in dates, right?
We don't need to replace single dash \u002d
, do we?
Why don't we use \u002d
instead of \u2013
?
There are two kinds of dash involved here.
The delimiter dash should indeed remain a hyphen. But the delimiter dash is not --
in ISO input anyway.
The --
is used in ISO input to indicate a range, as in "2016--2017". That dash needs to be converted into \u2013 for conventional typography, instead of me doing it inconsistently (because ISO --
means \u2014 in all other contexts.)
So yes, we replace --
with \u2013, and no, we are not using \u0026, because that is correctly already rendered in ISO output as single -
.
I don't see how relaton-iso can encounter ' -- ' or '--'. All date inputs go through Date.parse and outputs with a single hyphen. Date rages like "2016--2017" stored in separated fields (from: 2016, to: 2017)
Follow-on from https://github.com/metanorma/metanorma.com/issues/395
In relaton data scraped from ISO,
--
is interpreted as em-dash in much of the text, but as en-dash in the context of dates. That interpretation until now has been done in metanorma, but we are streamlining processing of text substitution, so instead it should be done upstream: it is an idiosyncrasy peculiar to ISO, and not generic behaviour.So, when you have an instance of
--
or--
(with optional surrounding spaces) in dates and only in dates in ISO data, please replace them in output with–
(with no surrounding spaces).