editorialbot commented 1 year ago

Submitting author: !--author-handle-->@jjmccollum@majensen<!--end-editor-- Reviewers: @tresoldi, @SimonGreenhill Archive: 10.5281/zenodo.7455638

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/e0a813f4cdf56e9f6ae5d555ce6ed93b"><img src="https://joss.theoj.org/papers/e0a813f4cdf56e9f6ae5d555ce6ed93b/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/e0a813f4cdf56e9f6ae5d555ce6ed93b/status.svg)](https://joss.theoj.org/papers/e0a813f4cdf56e9f6ae5d555ce6ed93b)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@tresoldi & @SimonGreenhill, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @majensen know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @SimonGreenhill

📝 Checklist for @tresoldi

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 year ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.05 s (918.9 files/s, 121045.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          15            269            476           1559
XML                              4              0             30           1373
SVG                              4              0              0            985
reStructuredText                 9            220            189            438
TeX                              1             41              0            226
Markdown                         4             74              0            223
YAML                             7              7             16            213
TOML                             1              7              0             45
DOS Batch                        1              8              1             26
make                             1              4              7              9
Bourne Shell                     2              1              1              7
-------------------------------------------------------------------------------
SUM:                            49            631            720           5104
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 1 year ago

Wordcount for paper.md is 1969

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/sysbio/46.4.590 is OK
- 10.1111/cla.12160 is OK
- 10.1093/molbev/msaa015 is OK
- 10.1093/sysbio/sys029 is OK
- 10.1007/978-94-011-0325-1_2 is OK
- 10.1038/29667 is OK
- 10.1023/B:CHUM.0000009290.14571.59 is OK
- 10.1628/978-3-16-153324-2 is OK
- 10.16995/dscn.291 is OK
- 10.1038/s41586-020-2649-2 is OK

MISSING DOIs

- None

INVALID DOIs

- None

SimonGreenhill commented 1 year ago

Review checklist for @SimonGreenhill

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/jjmccollum/teiphy?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@jjmccollum) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

SimonGreenhill commented 1 year ago

Evaluation

This looks like a nicely written tool that will be very useful. The paper explains the use-case nicely, and the code is clear, very well-documented (both in the docs and in the code), and follows modern python best-practices.

The main issue I see is that the example in the paper is dealt with only cursorily rather than completely. This has the side effect of hiding the power of this program. And, while reading the paper, I got interested in the UBS dataset and wanted to know what the analysis told me, so some brief explanation of what the phylogeny shows would round the paper out nicely.

Other than that, I only have very minor recommendations which I enumerate below.

Nice work!

Comments on the Paper

[ ] I would like more detail on the example, preferably with worked through command line steps. This is already in the example section of the docs, so please add this into the paper too (especially the step of generating a nexus file, showing and explaining the command line arguments).
[ ] Figure 1, and discussion: can you please describe what this tree show us (e.g. do these groupings make sense, do they tell us something interesting about this document?)
[ ] L16: please define what the acronym "TEI XML" means.
[ ] L37 "since then, phylogenetic methods have quickly evolved (Felsenstein, 2004),". Please reword -- this reads as if phylogenetic methods were driven by textual analyses, whereas, these methods have been in continuous development since the 70s and 80s, and not for textual analyses.
[ ] L52 "great chasm .. fixed". Please edit - it's a bit over the top, and/or the verb "fixed" is awkward. Perhaps "exists" or "hinders the use of TEI XML in phylogenetic contexts".
[ ] L60: TNT's claims of 'remarkable performance' may have been true in 2008 but now there are plenty of genomic-scale methods out there that can outcompete TNT (something like RAXML-NG probably blows it out of water for example). A better argument for TNT is that it has a comprehensive of morphological state models that may be appropriate for digital humanities data. You later go on to use tools like IQ-Tree and MrBayes, so it'd be worth broadening this section to mentioning these tools up front too. Don't let people think your program is innately tied to "cladistic" tools and approaches (and I mean "cladistic" here in the pejorative sense).
[ ] L72: Might be worth noting that this provides an immediate pathway into all the machine learning tools available in python (e.g. scikit-learn, tensorflow etc).
[ ] L75-76: Can you briefly explain what 'lacunae' and 'retroversions' are (this also strengthens the case for your program). Another term 'corrector' is used later in the m.s. without explanation as well.
[ ] L79: what do you mean by 'soft decision'?

Comments on the code

[ ] It would be useful to have a version number, and expose this via --version or similar.
[ ] pyproject.toml could include some more useful classifiers to help people find the package on PYPI e.g.:

    "Programming Language :: Python :: 3",
    "Operating System :: OS Independent",
    "Programming Language :: Python :: 3.5",  # etc -- which versions do you support?

[ ] have you thought about adding support for phy or fasta formats? these are pretty common in bioinformatics these days and are the native format of tools like raxml. This may be out of the scope of your program so feel free to disagree, but the formats themselves are pretty trivial to write ("> taxon\n>sequence\n\n")
[ ] please add documentation for how to run tests into readme.rst

jjmccollum commented 1 year ago

@SimonGreenhill Thanks for your thorough and insightful comments! I'll open a branch or two for revisions to the paper and additions to the code. I don't think it should be too hard to add support for phy and fasta, and it would definitely be within the scope of the project to do so.

rbturnbull commented 1 year ago

Thanks for all your feedback @SimonGreenhill - that's all really helpful

jjmccollum commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/sysbio/46.4.590 is OK
- 10.1111/cla.12160 is OK
- 10.1093/molbev/msaa015 is OK
- 10.1093/bioinformatics/btu033 is OK
- 10.1093/sysbio/sys029 is OK
- 10.1371/journal.pcbi.1006650 is OK
- 10.16995/dscn.291 is OK
- 10.1007/978-94-011-0325-1_2 is OK
- 10.1038/29667 is OK
- 10.1023/B:CHUM.0000009290.14571.59 is OK
- 10.1628/978-3-16-153324-2 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.5281/zenodo.3509134 is OK
- 10.25080/Majora-92bf1922-00a is OK

MISSING DOIs

- None

INVALID DOIs

- None

jjmccollum commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jjmccollum commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/sysbio/46.4.590 is OK
- 10.1111/cla.12160 is OK
- 10.1093/molbev/msaa015 is OK
- 10.1093/bioinformatics/btu033 is OK
- 10.1093/sysbio/sys029 is OK
- 10.1371/journal.pcbi.1006650 is OK
- 10.16995/dscn.291 is OK
- 10.1007/978-94-011-0325-1_2 is OK
- 10.1038/29667 is OK
- 10.1023/B:CHUM.0000009290.14571.59 is OK
- 10.1628/978-3-16-153324-2 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.5281/zenodo.3509134 is OK
- 10.25080/Majora-92bf1922-00a is OK

MISSING DOIs

- None

INVALID DOIs

- None

jjmccollum commented 1 year ago

@SimonGreenhill We've made your suggested revisions to the paper and the code, and both should now be accessible in the GitHub repo. I've also had editorialbot regenerate the paper PDF here for easy access. If you have any further comments, they are appreciated!

SimonGreenhill commented 1 year ago

Wonderful, looks all good to me, @jjmccollum. Nice work!

danielskatz commented 1 year ago

@majensen - what's the status of this submission?

majensen commented 1 year ago

Apologies @jjmccollum @danielskatz have been AWOL. @tresoldi can you update us on your review please?

tresoldi commented 1 year ago

Hello, so sorry for the delay. I will finish by tomorrow noon (31/11, European time).

tresoldi commented 1 year ago

Review checklist for @tresoldi

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/jjmccollum/teiphy?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@jjmccollum) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

tresoldi commented 1 year ago

Evaluation

The paper presents a high quality software that, as described by the authors, fills a very important void in phylogenetic studies for textual traditions. I can see their work becoming part of a "best practice pipeline" for such analyses, and I will surely start to recommend it.

Not addressing the topics already pointed by @SimonGreenhill :

Comments on the software

[ ] While I understand the decision from a philological perspective and for the symmetry with other options (such as for ambiguous states), when considering the most expected usages for the output it would make sense to have the NEXUS format defaulting to "states present".
[ ] The authors briefly mention how some phylogenetic software (like IQTree) can automatically perform ascertainment correction (usually with the Lewis MK model), but an --ascertainment flag would be very welcome for the NEXUS output format in general (or even for all formats). Not only some software expects the user to perform the correction themselves by adding the states to the alignment, but in some cases we might prefer to have it in the model even when the software offers it as an option.

Comments on the paper

[ ] the format for Carlson's STEMMA is described as "unique" (lines 13 and 74), which I assume means "exclusive to". The adjective seems superfluous, not only because the format is not "unique" but also because the audience of the paper will most likely know about the format.
[ ] There is no dispute that critical texts are fundamental to classical phylology (line 35), but the authors use "digital humanities" in a stricter sense than the one adopted by most people, and more as "the on-going digitalization of philological work". This is a common usage in the field of which I am also guilty, but as it has been pointed to me in the past it might be worth talking about "digital turn in philology" or something alike, reserving DH for the sense of "intersection of computing and humanities in general"
[ ] In line 53, it is possible to say that no currently available phylogenetic software expects input in TEI XML, stressing the value of this contribution
[ ] In line 54, it would be worth mentioning that the format was conceived not only for versatility, but also for promiting digital adoption in stemmatology
[ ] In line 64, PAUP* is not cited
[ ] In line 85, from experience there are cases where researchers might prefer or need a long table format, transposing the table (it is not practical for human inspection in most cases, but I would not dismiss it)
[ ] Installations instructions are given in the repository README file and in the documentation, but not in the draft. It is not an issue, given that it is documented and the procedure is easy and intuitive, but explict instructions are missing.

And, once more, great work @jjmccollum !

jjmccollum commented 1 year ago

@tresoldi Thank you for these comments! These changes seem straightforward and shouldn't take too long to implement (though I might need @rbturnbull's help on the --ascertainment option for NEXUS outputs). I'll open a branch or two for revisions to the paper and additions to the code.

Regarding the citation of PAUP*, I wasn't able to find a paper describing it, but it looks like the website just suggests citing the software as follows:

Swofford, D. L. 2003. PAUP. Phylogenetic Analysis Using Parsimony (and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.

If you had a paper in mind that would be more appropriate to cite, just let us know; otherwise, I will cite the software above.

jjmccollum commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jjmccollum commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/sysbio/46.4.590 is OK
- 10.1111/cla.12160 is OK
- 10.1093/molbev/msaa015 is OK
- 10.1093/bioinformatics/btu033 is OK
- 10.1093/sysbio/sys029 is OK
- 10.1371/journal.pcbi.1006650 is OK
- 10.16995/dscn.291 is OK
- 10.1007/978-94-011-0325-1_2 is OK
- 10.1038/29667 is OK
- 10.1023/B:CHUM.0000009290.14571.59 is OK
- 10.1628/978-3-16-153324-2 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.5281/zenodo.3509134 is OK
- 10.25080/Majora-92bf1922-00a is OK

MISSING DOIs

- None

INVALID DOIs

- None

tresoldi commented 1 year ago

@jjmccollum for ascertainment, only adding a uniform character with zeros should be enough. I have seen a small group of people working in computational linguistics experimenting with having a missing value (?) in the ascertainment character when the value is missing (so that the ascertainment character value has both zeros and questions marks), but it is not the common practice and would over-complicate things. Maybe for a future version, if somebody asks for it.

As for PAUP* that is the reference I see from time to time, that is perfect.

jjmccollum commented 1 year ago

@tresoldi Oh, that makes things much easier! Thanks for clarifying that. I should be able to add that feature for most or all output formats (although I don't know if every phylogenetic program we test in our workflows will use the ascertainment character).

rbturnbull commented 1 year ago

Hi @jjmccollum - there are helpful explanations about correcting for ascertainment bias with a dummy character in the Lewis 2001 paper and in this paper: Felsenstein, J. (1992), PHYLOGENIES FROM RESTRICTION SITES: A MAXIMUM-LIKELIHOOD APPROACH. Evolution, 46: 159-173. https://doi.org/10.1111/j.1558-5646.1992.tb01991.x

I'm happy to work on this.

rbturnbull commented 1 year ago

Hi @tresoldi - I've been reading papers on correcting for ascertainment bias and looking through implementations in phylogenetic software programs. I now think that it is actually out of scope for the teiphy tool because there are different ways of doing it that are bound up with the phylogenetic method that you are using. I think it's best for teiphy to just convert the TEI XML data and represent it in nexus, CSV etc and to leave it up to the user to correct for ascertainment bias in the phylogenetic software package that they use. Are you happy with that?

tresoldi commented 1 year ago

Hi @rbturnbull . The short answer is that yes, I am happy with it.

Longer answer. I was considering the issue this afternoon, and you are right that different models and tools (besides research decisions) would require different approaches for ascertainment bias (even more considering how correction has been used mostly for better dating in biology/linguistics, which works differently for textual traditions).

Having an option for the most common correction (a uniform character with absence) still sounds like a good addition, but not one upon which the paper acceptance would depend. I would be happier with a comment (in the documentation) explicitly saying that teiphy makes no such correction which left to the user/software, but this is not "mandatory".

As you mention, users can easily perform it from other formats, especially tabular ones. The output for such task could, however, be improved: it might be my fault, but teiphy seems to only emit tabular files with probabilities (e.g., 0.333 for the three values of B10K1V1U24-26 for manuscript 01), given that the --states-present option only applies to nexus output.

While parsing the latter is easy, I would like to ask you to investigate if, for users' convenience, it is possible to have a tabular output similar to the matrix of the nexus one when --states-present is set. It would be even better if you could provide the results in long table format, something like (optionally using the nexus character value indexes, along with question marks, instead of the actual reading):

Taxon,Character,Value
UBS,B10K1V1U24-26,εν εφεσω
P46,B10K1V1U24-26,om.
01,B10K1V1U24-26,
02,B10K1V1U24-26,εν εφεσω
03,B10K1V1U24-26,
04,B10K1V1U24-26,om.
UBS,B10K1V6U22-26,εν τω ηγαπημενω
P46,B10K1V6U22-26,εν τω ηγαπημενω
01,B10K1V6U22-26,εν τω ηγαπημενω
02,B10K1V6U22-26,εν τω ηγαπημενω
03,B10K1V6U22-26,εν τω ηγαπημενω
04,B10K1V6U22-26,εν τω ηγαπημενω υιω αυτου

Once more, this is not a request but a suggestion, as it goes beyond your initial scope and does not diminish the validity and importance of your submission. I would be happy to open it as an issue in the teiphy repository once the paper is published, where we could discuss in more detail.

jjmccollum commented 1 year ago

@tresoldi I think I could implement a --long-table option for tabular outputs (NumPy, Pandas DataFrame, CSV, TSV) like the one detailed above. To make things easier, I might just include both the reading symbol and the long form of the reading in consecutive columns:

Taxon,Character,State,Value
UBS,B10K1V1U24-26,0,εν εφεσω
P46,B10K1V1U24-26,1,om.
01,B10K1V1U24-26,1,om.
02,B10K1V1U24-26,0,εν εφεσω
03,B10K1V1U24-26,1,om.
04,B10K1V1U24-26,?,?
UBS,B10K1V6U22-26,0,εν τω ηγαπημενω
P46,B10K1V6U22-26,0,εν τω ηγαπημενω
01,B10K1V6U22-26,0,εν τω ηγαπημενω
02,B10K1V6U22-26,0,εν τω ηγαπημενω
03,B10K1V6U22-26,0,εν τω ηγαπημενω
04,B10K1V6U22-26,1,εν τω ηγαπημενω υιω αυτου

For this, I would probably have to coerce all ambiguous readings to look like missing data (i.e., ?), because duplicating rows for the same taxon and character seems more potentially problematic. How does this sound to you?

rbturnbull commented 1 year ago

Hi @tresoldi - thanks for your comment. That all sounds great. I've added a section in the docs about ascertainment bias here: https://jjmccollum.github.io/teiphy/advanced.html#ascertainment-bias

jjmccollum commented 1 year ago

@tresoldi Okay, I just finished implementing the --long-table option as requested! I'm using four columns instead of three, as I described in my last comment, and because of how I loop through the entries in the collation, the rows are ordered by taxon first and character second, but both of these features can be easily adjusted in an editor like Excel.

If all of this looks satisfactory to you, then I'm happy with passing this back up to @majensen! If the paper proceeds to publication, then I will plan to bump the repository (and PyPI entry) to a new release with these features at the same time.

tresoldi commented 1 year ago

@jjmccollum @rbturnbull wonderful, thank you! No further issues from my side. :)

tresoldi commented 1 year ago

@majensen I recommend the paper for publication, it is a great addition to the TEI ecosystem and to digital stemmatology in general.

majensen commented 1 year ago

@tresoldi @SimonGreenhill thanks so much for your thoughtful and detailed reviews. @jjmccollum I will run through the paper with my editorial hat on, and will correct, rebuke, and encourage. After this, I will provide some instructions about creating an open archive prior to publication.

jjmccollum commented 1 year ago

@majensen Since we're getting close to the holiday break, I just wanted to let you know in advance that I will be on vacation (and without access to a computer) from December 21 to January 3. So if you send me your feedback during this time and notice that I am taking a while to respond, that is the reason. Thanks!

majensen commented 1 year ago

@jjmccollum thanks for the heads up. I've been delinquent - I will get this done this week and hope we can wrap up well before you take off.

majensen commented 1 year ago

@editorialbot generate pdf

majensen commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/sysbio/46.4.590 is OK
- 10.1111/cla.12160 is OK
- 10.1093/molbev/msaa015 is OK
- 10.1093/bioinformatics/btu033 is OK
- 10.1093/sysbio/sys029 is OK
- 10.1371/journal.pcbi.1006650 is OK
- 10.16995/dscn.291 is OK
- 10.1007/978-94-011-0325-1_2 is OK
- 10.1038/29667 is OK
- 10.1023/B:CHUM.0000009290.14571.59 is OK
- 10.1628/978-3-16-153324-2 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.5281/zenodo.3509134 is OK
- 10.25080/Majora-92bf1922-00a is OK
- 10.1080/106351501753462876 is OK

MISSING DOIs

- None

INVALID DOIs

- None

majensen commented 1 year ago

@jjmccollum the paper reads very well and I enjoyed it. It occurred to me that STEMMA's ability to detect contamination is exactly analogous to recombination. There is a large literature on "reticulate evolution", which is just accounting for the fact that recombination pulls in different histories for parts of the same gene. I suppose that using multiple sources to create a given manuscript is essentially this problem. The application I see in this work would be to check out algorithms that can detect and display these mixtures and see if TEIphy's encoding could generate a reticulation in the history of the three contaminated witnesses. I could go on because I find this pretty fascinating, but I won't. Check this ref I found https://www.sciencedirect.com/science/article/pii/S2666166722000715 which is methods-oriented and also contains a number of interesting references.

majensen commented 1 year ago

HOWEVER, don't do that for this paper. I think we are ready to go. Can you make an open archive of the codebase using Zenodo, FigShare or similar (see this tutorial https://github.com/openjournals/joss-reviews/issues/1839#issuecomment-560048729). If you would report the DOI of the resulting archive here that would be great.

rbturnbull commented 1 year ago

Thanks @majensen - this is a big area of research for us. We need to keep exploring ways of modelling contamination/reticulation. Thanks for the link for the article!

rbturnbull commented 1 year ago

@jjmccollum do you want me to create an archive of the project and put it on the University of Melbourne's FigShare platform? Or you can do it with Zenodo or FigShare yourself.

jjmccollum commented 1 year ago

Thanks so much, @majensen! @rbturnbull, I'll create a new release of the repo to connect to Zenodo now; this release should also include the changes we made in response to the other reviewers' comments.

rbturnbull commented 1 year ago

Terrific. It'd be good to release to PyPI as well.

openjournals / joss-reviews

[REVIEW]: teiphy: A Python Package for Converting TEI XML Collations to NEXUS and Other Formats #4879

Status

Reviewer instructions & questions

Checklists

Review checklist for @SimonGreenhill

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Evaluation

Comments on the Paper

Comments on the code

Review checklist for @tresoldi

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Evaluation

Comments on the software

Comments on the paper