IJCK reviewer 2 comments

kyleniemeyer commented 6 years ago

My main concern with the creation of a publically available database, formatted with ChemKED, is that it seems that the creation of the files has to be done manually, i.e. susceptible for errors. This is especially true if the experiment comprises a big set of datapoints and/or hundreds of measured species. Furthermore, the YAML data serialization format is indeed human-readable, but it does not provide an easy summary/overview of the experimental data, such as an excel or CSV file would give, and therefore, people might be not willing to put the extra effort.

[x] 1. How are the authors planning on ‘forcing’ the experimentalists to publish their data in this new ChemKED file format?
[x] 2. Am I correct to state that ChemKED is a YAML file with a predefined structure?
[x] 3. I think the article would be more appealing if it would have at least one figure as an example.

Abstract and introduction

[x] 4. In the abstract, it is mentioned that the Python-based package PyKED creates ChemKED- formatted files. However in the article there is no evidence that PyKED aides in creating ChemKED-formatted files.
[x] 5. Page 1, line 31: The future development plans include support for jet stirred reactors, however, this is not mentioned in section 5, Conclusions and future work, nor is there any reference to it on the GitHub wiki roadmap of PyKED.
[x] 6. I don’t completely agree with the second part of the statement on page 2 line 16-18: “Second, the XML format is intended to be a machine-readable mark-up language rather than a data format, and its lack of human readability presents a barrier to creating and working with database files.”

In my opinion the YAML format resembles the XML format strongly. XML format is human-readable, however, because of the html-tags and ‘<>’ characters, the YAML format might be easier to read. If XML would lack human readability as it is stated in the article, it would not be such a popular format, and I don’t think PrIMe and ReSpecTh would be using it.

[x] 7. An initial version of the ChemKED format was introduced by Niemeyer, as is mentioned in the introduction. The mentioned article describes a Python-package called PyTeCK. How does this PyTeCK package relate to the PyKED package that is presented in the submitted article?
[x] 8. Page 3, line 10: “Our motivations are similar to those of ”, ‘to’ is missing.

Overview of ChemKED format

[x] 9. A reference should not be a required mapping of an experimental dataset. If the intention is to make the community to adopt the ChemKED file structure, it might be easier to do so if experimentalists can use the structure internally as well, i.e. to read, store, and share non- published internal data.
[x] 10. Page 4, line 32: “The second section of the file encodes the experimental data”. Perhaps it is easier to follow if this is briefly mentioned before the authors explain the details of the first section of the ChemKED format.
[x] 11. Why do both atomic-composition and elemental-composition exist as synonyms? Wouldn’t it be clearer if only one of the two can be used as a valid input?
[x] 12. Is it possible to create for example two common-properties composition blocks A and B and refer to them in the DataPoint keys?
[x] 13. Page 6, line 12 describes the initial efforts of building an open repository of ChemKED files, on GitHub. Is this the most efficient way for making a publically available database? One issue with this is the fact that adding files is done via pull requests, which have to be verified and approved by the owners of the repository. Therefore a certain delay is inevitable.
[x] 14. Page 6, line 18 states that a converter exists from ReSpecTh to ChemKED. But isn’t this feature added when going from v0.1.6 to v0.2.0, which makes page 3 line 21: v0.1.6 not correct.
[x] 15. Page 6, line 21, is the ‘,’ necessary after fork?

PyKED architecture

[x] 16. The Python package Cerberus is used to validate the format and content of the given ChemKED file. How does this package handles/reports the inconsistencies with the schemas? As far as my local test went with giving a corrupted YAML file to the ChemKED initiator, the package just spits out a Python error, which might be not the most user-friendly solution.
[x] 17. A recurring issue with actively developing codes is backwards compatibility. How are the authors planning on addressing this issue?
[x] 18. Page 7, line 12, mole-percent, mole-fraction, and mass-fraction include a dash in their name, whereas this is not the case for the example on page 5 line 48 and in the description of the composition element on page 4, line 47.
[x] 19. Same issue as #18 holds for page 7, line 17.
[x] 20. I think it would be more clear if volume_history, compression_time, compressed_temperature, and compressed_pressure would be combined. The line: ‘if the ChemKED file encodes an RCM experiment’ is unnecessarily repeated.
[x] 21. Page 7, line 39: “Future versions of PyKED may support this feature via, e.g. online lookup”. There are two main caveats with online lookups: a) It will make the code slower if the internet connection is not strong, b) an internet connection is necessary for the code to be executed.
[x] 22. Page 7, line 40: Why is it interesting to create the YAML file again with the write_file() function. Creating a summary excel/CSV file would be more useful.

Usage examples

RCM modelling with varying reactor volume

[x] 23. Page 10, line 29-30: “A user might want to load the information ” Avoid the term ‘might’. It’s one of the main goals of using ChemKED and its associated Python-package PyKED to load the information necessary for simulating the experiment.
[ ] 24. The Python code snippets are not always initiated in the text.
[x] 25. Page 10, line 55: add more commentary text. It is not clear that those lines create the necessary instances for Cantera, rather than ChemKED.
[x] 26. Page 11, line 7: Where is this ‘air.xml’ file coming from. This is not clear from the text.

Shock tube modelling with constant volume

[x] 27. The table overlapping page 11 and 12 can be reduced by truncating the DataPoints, similarly as is done with the table starting at page 9, line 31 and ending at page 10, line 27.

Python code snippet on page 13:
- [x] a. Add more commentary text.
- [x] b. Changing the example to enable multithreading might make it more appealing to start using ChemKED as a data format in addition with Cantera as a simulation tool.

Conclusions and future work

[x] 29. No conclusion is mentioned for the usage examples.

kyleniemeyer commented 6 years ago

OK, there's a lot here... helpfully, they numbered the comments for us!

Beginning a few thoughts:

General: reviewer 1 wanted a shorter article, while reviewer 2 wants more in various places... I'm

Any ideas for a figure? I agree that it would be nice to have one, though we do have a lot of code snippets and such. Maybe we could add plotting of calculated ignition delays to the example at the end, and actually show such a figure? Or, we could include some sort of workflow diagram? The first half of the paper is probably what needs one the most, just to break up text, but I don't have any good ideas.
We do have the converters, but we don't (yet) have anything that helps create files from scratch... perhaps we should? The GUI work can be ongoing, but just something that prompts for entries or something (with the ability to specify a CSV file for volume history, e.g.).
I can add mention of the future plans to the Conclusions section, and we should also add that to the roadmap here... wow this reviewer is really looking into everything!
Both reviewers seem to be taking a very literal definition of "human-readable", in that XML is written in ASCII rather than binary... however this one does agree that YAML is more readable, which of course is our point. I think we can emphasize that we mean understandable rather than technically readable, and that in relative terms YAML is better here.
I agree with the reviewer here—reference shouldn't be required. We already started discussing some related topics in https://github.com/pr-omethe-us/PyKED/issues/69 and privately; my suggestion is that we drop it as a required field, but validate it as we do now if given. I've started an issue on that here: https://github.com/pr-omethe-us/PyKED/issues/76
Do we need both atomic-composition and elemental-composition, or could we just pick one?
Regarding two composition blocks inside common-properties, I think this would be possible, as long as field are given different YAML tags. Perhaps we should add a test for this.
While I agree that submitted PRs on GitHub is less efficient than just uploading files willy-nilly, I think that there still needs to be some sort of human curation involved. However, one major benefit is that we have a tool that checks for basic errors, and also the submission happens in the open.
We've already talked about the less-than-useful error messages that Cerberus gives; I'm not sure if there's anything we can do here other than address the issue and say we plan to improve these in the future.
We already somewhat-address backwards compatibility by putting ChemKED versions in the files, but don't actually do anything with this yet... right? Also, the older versions of PyKED are all archived and still available.
I agree that we should be consistent about using hyphen, underscore, or nothing in fields with multiple words. I like hyphens.
I'm not sure that we gain any readability by combining volume_history and the compressed conditions, other than grouping RCM-specific stuff. Any thoughts?
Online lookup could slow things down a bit, if that's important—perhaps we should mention the "hidden" disable_validation feature we added, when performance is important? Also, we can mention that our validation steps that require online lookup fail gracefully with no internet.
I think there are some valid use cases for creating another file, for example when you have manipulated a ChemKED object and want to write it. However, what do think about this idea of printing summary statistics?

23--26 I should be able to fix.

(b) I agree we could improve the example here, or point to the additional examples in the documentation.

kyleniemeyer commented 6 years ago

Ohh, I think for

I think it would be more clear if volume_history, compression_time, compressed_temperature, and compressed_pressure would be combined. The line: ‘if the ChemKED file encodes an RCM experiment’ is unnecessarily repeated.

the reviewer meant just in the text, since those are all conditional on the file being for an RCM experiment—I don't think it's a suggestion on combining in ChemKED itself.

Maybe we can separate the RCM-specific stuff into a separate list, with some explanatory text? Otherwise we could lump into a single bullet, but I don't think that improves clarity.

kyleniemeyer commented 6 years ago

Reviewer #2 done!

pr-omethe-us / pyked-papers