phetsims / build-a-molecule

"Build a Molecule" is an educational simulation in HTML5, by PhET Interactive Simulations.
GNU General Public License v3.0
8 stars 7 forks source link

Import new data set from PubChem #158

Open Denz1994 opened 4 years ago

Denz1994 commented 4 years ago

This sim requires that all possible molecules and molecule structures are defined prior to being built. This data is stored in js/data and was derived from PubChem. Taking a look at js/data/ we see the current data set is comprised of:

The tools used to generate this data set have yet to be completely ported from Java and would require additional documentation. This includes handling filtering out any molecules not desired for this sim. During the design meeting on 01/31/20, it was decided to postpone this work until after publication of this sim.

Assigning to @ariel-phet for prioritization and assignment.

Denz1994 commented 4 years ago

Additional Details:

The approach to importing the data set in the legacy version required a parser, some filters, and a post-processor.

The parser (MoleculeSDFCombinedParser) is responsible for importing all the data from PubChem using an SDF file. This will generate two text files of molecule data (collection-molecules.txt and other-molecule.txt). Collection-molecules.txt contains molecule data for the collection boxes, while other-molecules.txt holds data for other molecules that can be built in the sim. See https://github.com/phetsims/build-a-molecule/issues/153#issuecomment-580072079 for details on how to read these entries.

At this point, we will need to filter out molecules that we don't want to build (either for pedagogical, or memory reasons). MoleculeKitFilterer and MoleculeDuplicateNameFilter handle this for us.

The last step involves MoleculePreprocessing, which will generate the structural format for our molecules in a serialize format. See Structure.txt

Action Items:

Denz1994 commented 4 years ago

Here is a zip file of the BAM legacy source code with the relevant content described above: build-a-molecule-java.zip