softcite / software-mentions

Softcite software mention recognizer, finding mentions and citations to software from within the academic literature
Apache License 2.0
68 stars 11 forks source link

Just typo fixes #7

Closed caifand closed 4 years ago

caifand commented 4 years ago

Seems involving additional whitespace lines in the corpus file. Perhaps due to the text editor I am using which formatted the file automatically. Not sure if it will be an issue?

kermitt2 commented 4 years ago

Thank you @caifand !

The extra spaces between the <tei> elements are due to the pretty-formatting, indenting is done with those extra-spaces. I think it is fine like that.

However, we should not have some space+EOL within the paragraph indeed, we would need I think a space and no EOL (EOL is equivalent to a space for the TEI encoding, an end of line has an explicit line break tag <lb/>) - in any case not both, which means "double space".

I am going to update the file following your PR myself manually, because I have already modified in-between some TEI encoding (following some feedback I received!), otherwise it would give hundred of conflicts.

kermitt2 commented 4 years ago

Ah no sorry there are some extra space/EOL between the <tei> that are indeed useless! OK cleaning :)

caifand commented 4 years ago

The clarification is great to know! Essentially it's just changing "Howinson" to "Howison" and our lab is named as "Howison Lab" no first name included :) Thanks again for cleaning!