Open jchodera opened 2 years ago
If you want to convert them to a single file, that would be fine.
One point to keep in mind, of course: a major purpose of this repository is to memorialize exactly how we created the dataset. If we replace the files, and change the script accordingly, they will no longer match how we created the dataset. Granted that the existing script only works on Linux. But that's the script it was created with.
Of course. We've memorialized that in the release that was cut. That's the record of what we used to create the dataset.
Can we document this as a known bug in the release notes and avoid this practice in future? If we intend to keep adding to this repo, we can also fix the bug or else we will keep getting this error in future.
The
osx
default filesystem (HFS+) is case-insensitive, which means the decision to use filename case in naming thedes370k/SDFS/
and writing out individual files using SMILES strings instead of a single multi-molecule SDFs causes filename collisions and the repository cannot be properly checked out:As a resolution, I repeat my previous suggestion that this should be a single multi-molecule SDF file where all SDFs are collated and titled appropriately within the file.