molovol / MoloVol

MoloVol is a free, cross-plattform, scientific software for volume and surface computations of single molecules and crystallographic unit cells.
https://molovol.com
MIT License
22 stars 4 forks source link

[Solved] Issue recognising space group from imported CIF file #114

Closed jmaglic closed 2 years ago

jmaglic commented 2 years ago

Problem

For some CIF files, the space group was not correctly imported.

Solution

This issue was solved in commit d9111d0f6f7743a7a2e515ed7110fae4ccc14e4e. It was caused by the presence of the carriage return character \r and the differences between end-of-line characters on macOS and Windows. Debugging was complicated by how \r is handled when printing to the console. Any future code that handles importing files may run into the similar issues. Therefore, I decided to record and explain the problem.

When lines are read using getline(), they are stripped of the newline character at the end of the line. Different platforms have different end-of-line (EOL) characters that signify a new line. Unix-like systems use \n while other systems, prominently Windows, use '\r\n'. When a document, that follows Window's convention, is imported (on macOS), then only \n is stripped by getline() while \r remains. The \r character has to be manually removed to avoid bugs.

Specifically, the issue here was brought on by importing the space group, removing excess whitespaces, and finally adding ticks left and right. The resulting string was used to identify the space group by comparing strings. However, due to the carriage return character, a line such as:

_spacegroup    CC     \r\n

was imported (after removal of whitespaces and addition of ticks) as:

std::string = "'CC\r'";

This turned out to obscure the issue. Printing this string using std::cout results in:

'CC

because \r acts like a carriage return in a typewriter. In other words \r makes the text start again at the beginning of the line and overwrites each character. Since the last and the first characters are identical, it appears like the right tick simply disappears. Here's an example showing more clearly what happens.

std::cout << "ABCD\rE" << std::endl;
EBCD

Take-home Message

Always make sure to remove all EOL characters when importing text files.