Closed whedon closed 3 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @johanneswerner, @jcmcnch it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.7287/peerj.preprints.27295v1 is OK
- 10.1038/s41564-018-0176-9 is OK
- 10.1038/s41467-017-02342-1 is OK
- 10.1101/2020.06.30.180687 is OK
- 10.5281/zenodo.1476236 is OK
- 10.1051/0004-6361/201629272 is OK
- 10.1051/0004-6361/201322068 is OK
MISSING DOIs
- 10.1142/s0219720012500151 may be a valid DOI for title: Metagenomic taxonomic classification using extreme learning machines
- 10.1038/ncomms11257 may be a valid DOI for title: Fast and sensitive taxonomic classification for metagenomics with Kaiju
- 10.1111/1755-0998.13147 may be a valid DOI for title: A metagenomic assessment of microbial eukaryotic diversity in the global ocean
- 10.1038/ismej.2015.30 may be a valid DOI for title: Metatranscriptomic census of active protists in soils
- 10.1038/nrmicro.2016.160 may be a valid DOI for title: Probing the evolution, ecology and physiology of marine protists using transcriptomics
- 10.1093/database/baaa051 may be a valid DOI for title: SAGER: a database of Symbiodiniaceae and Algal Genomic Resource
- 10.1111/jpy.12529 may be a valid DOI for title: Robust Dinoflagellata phylogeny inferred from public transcriptome databases
- 10.1093/database/baaa051 may be a valid DOI for title: SAGER: a database of Symbiodiniaceae and Algal Genomic Resource
- 10.1016/j.tim.2018.10.009 may be a valid DOI for title: Are we overestimating protistan diversity in nature?
- 10.1093/nar/gks1160 may be a valid DOI for title: The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy
- 10.1016/j.tree.2014.03.006 may be a valid DOI for title: The others: our biased perspective of eukaryotic genomes
- 10.1371/journal.pbio.2005849 may be a valid DOI for title: EukRef: Phylogenetic curation of ribosomal RNA to enhance understanding of eukaryotic diversity and distribution
- 10.1093/gigascience/giy158 may be a valid DOI for title: Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
- 10.1007/978-3-319-61510-3_4 may be a valid DOI for title: Functional analysis in metagenomics using MEGAN 6
- 10.1007/978-1-4939-3369-3_13 may be a valid DOI for title: MG-RAST, a metagenomics service for analysis of microbial community structure and function
- 10.1016/j.gpb.2015.08.003 may be a valid DOI for title: The Tara Oceans project: new opportunities and greater challenges ahead
- 10.1038/sdata.2017.203 may be a valid DOI for title: The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans
- 10.1093/bioinformatics/btw445 may be a valid DOI for title: SWORD—a highly efficient protein database search
- 10.1038/nmeth.3176 may be a valid DOI for title: Fast and sensitive protein alignment using DIAMOND
- 10.1101/2020.06.30.180687 may be a valid DOI for title: EukProt: a database of genome-scale predicted proteins across the diversity of eukaryotic life
- 10.1371/journal.pone.0016342 may be a valid DOI for title: How and why DNA barcodes underestimate the diversity of microbial eukaryotes
- 10.1038/ncomms12860 may be a valid DOI for title: Adaptive radiation by waves of gene transfer leads to fine-scale resource partitioning in marine microbes
- 10.1111/gcb.12983 may be a valid DOI for title: Bridging the gap between omics and earth system science to better understand how environmental change impacts marine microbes
- 10.1098/rstb.2015.0331 may be a valid DOI for title: Censusing marine eukaryotic diversity in the twenty-first century
- 10.1007/978-3-030-38281-0_12 may be a valid DOI for title: Eukaryotic Pangenomes
- 10.1038/nature12221 may be a valid DOI for title: Pan genome of the phytoplankton Emiliania underpins its global distribution
- 10.1128/aem.01541-09 may be a valid DOI for title: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities
INVALID DOIs
- None
:wave: @jcmcnch, please update us on how your review is going.
:wave: @johanneswerner, please update us on how your review is going.
Very interesting software package for the analysis of eukaroytes in metagenomes and metatranscriptomes. I like the focus of this tool and the well-written article and documentation, especially the very comprehensive documentation including all explanations and citations.
I have a few comments that might still be addressed.
installation
documentation
documentation
and :ref:Parameters
are not working in running-eukulele.rst
databaseandconfig.rst
: there are four not three databasesminimal working example:
EUKulele --config curr_config.yaml
Running EUKulele with entries from the provided configuration file.
No BUSCO file specified/found; using argument-specified organisms and taxonomy for BUSCO analysis.
Setting things up...
Found database folder for reference_DIR in current directory; will not re-download.
Creating a diamond reference from database files...
Aligning to reference database...
['samples_MAGs/sample_2.faa', 'samples_MAGs/sample_1.faa', 'samples_MAGs/sample_0.faa']
Aligning sample sample_2...
Aligning sample sample_1...
Aligning sample sample_0...
Diamond process exited for sample sample_2.
Diamond process exited for sample sample_1.
Diamond process exited for sample sample_0.
Performing taxonomic estimation steps...
Performing taxonomic visualization steps...
Performing taxonomic assignment steps...
Performing BUSCO steps...
Configuring BUSCO...
Running busco with 2 simultaneous jobs...
BUSCO error log:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/EUKulele/bin/busco_configurator.py", line 15, in <module>
for line in open(sys.argv[1]):
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/.local/bin/../config/config.ini'
sed: can't read test_out_23July/busco/config_sample_1.ini: No such file or directory
sed: can't read test_out_23July/busco/config_sample_1.ini: No such file or directory
sed: can't read test_out_23July/busco/config_sample_1.ini: No such file or directory
ERROR: Config file test_out_23July/busco/config_sample_2.ini cannot be found
ERROR: BUSCO analysis failed !
ERROR: Check the logs, read the user guide, and check the BUSCO issue board on https://gitlab.com/ezlab/busco/issues
BUSCO output log:
python3 busco_configurator.py /home/ubuntu/.local/bin/../config/config.ini test_out_23July/busco/config_sample_1.ini
INFO: ***** Start a BUSCO v4.1.4 analysis, current time: 11/17/2020 10:19:54 *****
INFO: Configuring BUSCO with test_out_23July/busco/config_sample_2.ini
BUSCO error log:
ERROR: Config file test_out_23July/busco/config_sample_1.ini cannot be found
ERROR: BUSCO analysis failed !
ERROR: Check the logs, read the user guide, and check the BUSCO issue board on https://gitlab.com/ezlab/busco/issues
BUSCO output log:
INFO: ***** Start a BUSCO v4.1.4 analysis, current time: 11/17/2020 10:19:54 *****
INFO: Configuring BUSCO with test_out_23July/busco/config_sample_1.ini
BUSCO error log:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/EUKulele/bin/busco_configurator.py", line 15, in <module>
for line in open(sys.argv[1]):
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/.local/bin/../config/config.ini'
sed: can't read test_out_23July/busco/config_sample_0.ini: No such file or directory
sed: can't read test_out_23July/busco/config_sample_0.ini: No such file or directory
sed: can't read test_out_23July/busco/config_sample_0.ini: No such file or directory
ERROR: Config file test_out_23July/busco/config_sample_0.ini cannot be found
ERROR: BUSCO analysis failed !
ERROR: Check the logs, read the user guide, and check the BUSCO issue board on https://gitlab.com/ezlab/busco/issues
BUSCO output log:
python3 busco_configurator.py /home/ubuntu/.local/bin/../config/config.ini test_out_23July/busco/config_sample_0.ini
INFO: ***** Start a BUSCO v4.1.4 analysis, current time: 11/17/2020 10:19:55 *****
INFO: Configuring BUSCO with test_out_23July/busco/config_sample_0.ini
[] is what is in BUSCO directory
BUSCO initial run did not complete successfully.
Please check the BUSCO run log files in the log/ folder.
pytest tests/
returns one failed test______________________________________________________ ERROR collecting tests/setupanddownload/test_database.py ______________________________________________________
ImportError while importing test module '/data/EUKulele/tests/setupanddownload/test_database.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/home/ubuntu/miniconda3/envs/EUKulele/lib/python3.6/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/setupanddownload/test_database.py:8: in <module>
from EUKuleleconfig import *
E ModuleNotFoundError: No module named 'EUKuleleconfig'
code quality
pylint $(git ls-files '*.py')
) on the repository returns a score of 3.07. I would try to get the pylint score to >=8 (and most warnings are easy to fix). minor comments
a few comments about the manuscript
databaseandconfig.rst
)Great - thanks @johanneswerner!
Can you please respond to these comments when you get the chance @akrinos.
@jcmcnch - can you let us know how your are getting on please?
@johanneswerner Thank you so much for the very helpful review!
I will respond to what I have responses for thus far and update as additional comments are addressed.
conda
installation indeed is quite slow - I am hoping to go through the process of adding it to the bioconda
channel after publishing the paper, and am hopeful that that will provide a speedup over my user channel.Other Questions
phylodb
, actually includes prokaryotes, so that is one option, but if prokaryotes were your group of interest, you would probably want to include your own database that is more complete. Beyond that, though, EUKulele
should work fine on such a sample, although it has specific things built in (e.g. the databases we've chosen) tailored towards eukaryotesEukZoo
was a recent addition; it is not tested on Travis yet, so apologies for the inconsistencies in where it is included! @whedon generate pdf
PDF failed to compile for issue #2817 with the following error:
Error reading bibliography file paper.bib: (line 461, column 3): unexpected "b" expecting space, ",", white space or "}" Looks like we failed to compile the PDF
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
As an update to the above:
pylint $(git ls-files '*.py')
is now 8.26/10 We are working on benchmarking and addressing BUSCO
-related issues. Thank you for your patience!
@johanneswerner I was able to reproduce the error you have been getting from BUSCO
when trying to run the sample_EUKulele
example folder.
It is indeed related to BUSCO
version 4.1.4, in which version the BUSCO
configuration file is stored in a different location than it was previously. I have implemented a patch that has been deployed to the conda
build of EUKulele
that searches for the configuration file in a different way. This solves the issue of the initial BUSCO
run, but the storage location of the final BUSCO
sequences will still be different due to the version change. For now, EUKulele
can be run with BUSCO
version 4.0.6 and Biopython
1.77. I will sort out the final versioning issues on the pip
install version such that both versions should work; for now at least the BUSCO
run itself functions.
Thanks again for your patience!
@johanneswerner we have provided a graphic below using the DIAMOND
alignment tool for various sizes of sequence files. The time is in minutes in terms of how long the full EUKulele
run takes to execute. Note that this is for metatranscriptomic (MET) sequences with or without using the TransDecoder
tool for translation (as well as colored by two different database selections, the MMETSP and PhyloDB).
We have also uploaded pip
and conda
-installable revisions of EUKulele
which address the issue you encountered with the latest BUSCO
version. In recently-uploaded version 1.0.1, these issues should be resolved, and you should be able to fully execute the small test example which you reported a prior test of above.
Thank you!
Dear @akrinos
thank you for your updates. I think I checked the above checkboxes that are taken care of (if I forgot something, please let me know).
Unfortunately, I still encountered errors with the minimum example (run.log) and some of the tests also throw errors on my virtual instance (tests.log). Could you please have a look at them?
Thank you very much for your effort, especially the benchmarking is very interesting.
Hmm, it looks like you're still getting the same error, which is most likely the cause for the failed tests as well (although I haven't looked carefully at each failure). Did you reinstall via conda
@johanneswerner ? It looks like from the error that it is defaulting to using the BUSCO
install that you have locally, rather than a BUSCO
install via conda
. Could you please try running EUKulele --version
? The problem is also in the included scripts run_busco.sh
and concatenate_busco.sh
, so printing the result of cat $(which run_busco.sh)
and cat $(which concatenate_busco.sh)
to a file would also help me verify that the fix that I have added is present in the files that your install is pulling. One problem I had was needing to remove prior installs.
If this continues to be an issue, I suppose we should move to another thread per the guidelines. Thanks for your persistence!
My apologies @akrinos, I pulled the git repository for the tests, but I forgot to reinstall via conda
. Thank you for looking into it. :-)
Test dataset runs accordingly after reinstallation with conda
, and the tests also pass. I marked the respective checkboxes above.
Thank you @johanneswerner! Did you still have one failed test as above (checkbox in initial review)? With regard to the last two remaining checkboxes, for the analysis of prokaryotes, as mentioned in 729306579, we have one default database that includes prokaryotes, and generally users can curate their own datasets including prokaryotes, we have just tailored the tool to eukaryotes. As far as other software to compare ours too, one other tool I found was CCMetagen, published earlier this year. This tool identifies eukaryotes in metagenomic samples, but is not for metatranscriptomes and only uses the NCBI database. It might be useful to point out how our approach is different from this one, which also compares itself to MEGAN. If it helps, I could include both of these explanations in either the text or the documentation, whichever seems more helpful. I think other than that, everything from your review has been addressed.
Thanks again!
Thank you for your comprehensive review @johanneswerner - this is shaping up nicely.
Pinging @jcmcnch - are you still able to review this submission? Please let us know either way ASAP
Hi @will-rowe @akrinos sorry for not getting back to you both sooner with this. I have been busy until recently and had unsubscribed from notifications (because I was getting about a dozen notifications from JOSS daily from unrelated reviews - perhaps something can be done by JOSS to prevent this). I am back on the case now, and will provide my comments ASAP, by the end of this week at the latest.
@akrinos , I just ran the test suite and it seems to work fine just by providing the yaml file as you describe, and everything seems to work - the errors mentioned above by Johannes seem to be fixed, except BUSCO generates no output. Is this expected? I checked the logs as recommended by the text printed to the screen but they were all empty. Here's the output I got:
(EUKulele) jesse@kraken:~/EUKulele-review/sample_EUKulele$ EUKulele --config curr_config.yaml
Running EUKulele with entries from the provided configuration file.
No BUSCO file specified/found; using argument-specified organisms and taxonomy for BUSCO analysis.
Setting things up...
Found database folder for reference_DIR in current directory; will not re-download.
Creating a diamond reference from database files...
Aligning to reference database...
Aligning sample sample_2...
Aligning sample sample_0...
Aligning sample sample_1...
Diamond process exited for sample sample_2.
Diamond process exited for sample sample_1.
Diamond process exited for sample sample_0.
Performing taxonomic estimation steps...
Performing taxonomic visualization steps...
Performing taxonomic assignment steps...
Performing BUSCO steps...
Configuring BUSCO...
Running busco with 1 simultaneous jobs...
[] is what is in BUSCO directory
BUSCO run either did not complete successfully, or returned no matches for sample sample_2 . Check busco_run log for details.
BUSCO run either did not complete successfully, or returned no matches for sample sample_0 . Check busco_run log for details.
BUSCO run either did not complete successfully, or returned no matches for sample sample_1 . Check busco_run log for details.
No BUSCO matches found for any sample. Check BUSCO run log for details. Exiting...
EUKulele run complete
Hi @jcmcnch - sorry it has taken me a bit to get back to you. I have been trying to reproduce this, and haven't been able to. You are using the sample_EUKulele
directory, right? This is what I expect to be printed:
Running busco with 1 simultaneous jobs...
['logs', 'short_summary.specific.eukaryota_odb10.sample_1.txt', 'run_eukaryota_odb10'] is what is in BUSCO directory
At least one BUSCO present in sample sample_1 but 250 missing.
At least one BUSCO present in sample sample_0 but 241 missing.
At least one BUSCO present in sample sample_2 but 245 missing.
Could you tell me (1) what EUKulele --version
returns and (2) the contents of your sample directory (it should be samples_MAGs
if you're using the tutorial) and BUSCO
directory in the output folder via ls
? Thanks!!
Hi @akrinos and other coauthors, again sorry for the delay in replying. I've had some time to properly "test drive" EUKulele, and now feel comfortable summarizing them as part of the review. I've noticed from your interactions with Johannes that this process seems quite interactive so I do hope we can discuss further in this thread. As I mentioned to Will at the beginning of this review process I'm somewhere closer to the naive end user and less of a software developer so will concentrate more on how I see your tool being used. These comments come from someone who is very interested in, but less knowledgeable about these "unsung" EUKs so it's kind of an outsider's view.
From the perspective of microbial oceanography, there seems to be a really strong cultural divide between people studying PROKs and people studying EUKs, despite the fact that the organisms in question interact in a larger system. So any effort to try and bridge this divide is really worthwhile scientifically and I think your tool and approach is a promising way to begin this effort. From my own perspective I had known about the MMETSP but was less confident to find/download the data. With EUKulele, having the database automatically downloaded is already very helpful and on top of that knowing that I'm getting a high-quality curated version of the MMETSP from experts in the field is really reassuring. I also really appreciate all the work that has gone into making EUKulele useful with multiple databases, bioinformatic methods, and providing visualizations.
My main feedback falls into two areas - 1) clarity of writing, code, and visualizations and 2) caveats applying your approach to mixed PROK/EUK metatranscriptomes. For 1), I will provide detailed comments further below, but a more general comment is that some broader concepts can be clarified for the benefit of those less familiar with eukaryotic work. For example, it took me a bit of time to understand what you mean by transcriptome. From the PROK side of the fence, transcriptomes are most often just something you map to your MAGs/contigs to get at expression but I recognize this is something quite different for EUKs - it's basically your metagenome. Clarifying this subtle distinction in the text might help people less familiar with your field understand this. I did see your warning in the readthedocs documentation about using EUK metagenomes which alludes to this issue but I think it could be further clarified and explained in a more prominent location. Otherwise, I think your already extensive documentation could be improved by some re-organization and re-focusing which I'll try to detail in the sections below. Also, I noted that metagenomes/MAGs were used somewhat interchangeably so this can also be clarified to explain what you mean.
Point 2) is a bit more getting at the real-world usage of the software, and a potential pitfall I see when your workflow is employed by a naive end user. It's related to Johannes' question about mixed PROK/EUK communities. To test this, I downloaded a transcriptome assembly from this paper, which can be found here if you want to play with it yourself. Data were downloaded from IMG. From the phyloDB results generated by EUKulele, this is clearly a mixed PROK/EUK transcriptome:
My main concern is this is not reflected in the MMETSP results. Things that are clearly bacterial contigs (e.g. scaffold_10004_c1 which is a roseobacter) are annotated as EUKs (in this case, as a diatom). The tabular output (i.e. output/taxonomy_estimation/*taxonomy.out) would not give a user an idea that this is the case - the column for "max_pid" says 86.5% in this contig's case (100% for the phyloDB output file), so I wouldn't have assumed there was an issue unless I knew a priori that this sample would be a mixed EUK/PROK transcriptome assembly. How would you address this with your pipeline or in the paper/documentation? Do you mostly work with poly-A tailed transcriptomes in your own work where this would be less of an issue? Or is there a way you could think to address this? Could there be a pre-filtering step to split PROK and EUK? Or could another column be provided to the user to identify these potential issues?
This proceeded well after updating conda, but as I mentioned in the bug report you may consider providing mamba as an alternative install method since it's much faster in general.
The test suite worked without errors, except the BUSCO error I raised above. FYI I didn't install with pip as mentioned in the readthedocs section, but ran it as recommended:
EUKulele --config curr_config.yaml
In response to your question @akrinos :
Version:
(EUKulele) jesse@kraken:~/EUKulele-review/sample_EUKulele$ EUKulele --version
Running EUKulele with command line arguments, as no valid configuration file was provided.
The current EUKulele version is 1.0.1
Contents of directory:
(EUKulele) jesse@kraken:~/EUKulele-review/sample_EUKulele$ ls
busco_286409437.log busco_327173586.log EUKulele-env.yaml path_test.txt samples_MAGs
busco_2930947595.log busco_downloads free.csv reference_DIR tax-cutoffs.yaml
busco_3040169547.log curr_config.yaml output-test.txt references_bins test_out_23July
Things were quite smooth. Database downloads proceeded as expected, and ran without errors on real-world MT data described above. I tested it with phyloDB and MMETSP using default settings on the MBARI bloom metatranscriptome assembly mentioned above.
Specific suggestions:
EUKulele --help
) is a little messy, I suggest reorganizing to move like parameters together (e.g. CPU and RAM usage), check for typos, and remove subroutines if it's not being usedOther than that, great job, thanks for sharing this powerful tool and your expert knowledge with the whole community. Looking forward to discussing more about the PROK/EUK mixed transcriptome assembly issue and how you see this being potentially addressed.
Jesse
Thanks for another comprehensive review @jcmcnch - there is plenty for you to mull over @akrinos! I'm particularly interested in the second of their points re. (mis)reporting of prokaryotic annotations. @jcmcnch has given several helpful suggestions to address this; at a minimum I'd like a comment in the documentation. In my view, it would make sense for the user to first bin their MAGs (or alternative input) into EUK and PROK before running EUKulele - but this functionality would be great to have in your software.
Please let us know your responses to the second review. We are definitely well on track here. I also have a couple of things for you to address in the paper:
Only the first of these points is a requirement from me!
Cheers,
Will
Thank you both for your very helpful comments! We are working on addressing the eukaryote/prokaryote mislabeling issue by generating a default database that contains the MMETSP taxonomy (the taxonomy we feel is better to use as a default in most cases for eukaryotic organisms) and also prokaryotic sequences and a domain level to distinguish between the two. We will include the MMETSP alone as an option, as we are a bit biased towards poly-A-selected samples for which you would likely only need to quickly check with a database like PhyloDB for whether contigs were preferentially mapping to bacteria, after which point the MMETSP would be the database of choice.
We will certainly adjust the organization of the summary section somewhat! For the affiliations, is adding the city/state/country names sufficient beyond what we have? It looks like that is what is included in the JOSS papers I looked at. More comments to come later today in regard to addressing more of the housekeeping issues with the repository as well. Thanks again!
Thanks for the speedy response @akrinos! Yes - just add the city and country please.
Keep us posted on when you are ready for us to take another look at the submission.
@jcmcnch thanks so much again for your very thorough and informative review! I have responded to some of the points you raised below, and will complete the process of addressing them and provide a new release tomorrow.
In response to the major issue with the use of MMETSP resulting in prokaryotic sequences being labeled as eukaryotic, I have added a new database to the default options, a combination of MarRef and the MMETSP. This is to enable prokaryotes to be identified, but also to use our preferred eukaryotic sequences. Here is a comparison of the output for each database, MMETSP, PhyloDB, and MarRef, using the sample dataset that you provided:
This also involves adding a separate "Domain" level to the MMETSP and MarRef. For now, I have made the software flexible, such that it will accept a number of labeling options from your database (Domain, Supergroup, Kingdom, Phylum, Class, Order, Family, Genus, Species, with the potential for more to be added), and arrange them based on expected taxonomic ordering. In the future, I plan to relabel the top level of PhyloDB to be "Domain" as well, since it makes more sense to use that label for the highest taxonomic level in PhyloDB, rather than using the MMETSP's "Supergroup" name.
So, the issue persists if you use the MMETSP on prokaryotic sequences - there tend to be spurious matches at the supergroup level, since the percent identity cutoff is quite low at this broad level, but we think it's important to retain the MMETSP option if you wish to only consider eukaryotic matches. However, we will add an additional warning to the documentation, and MarRef-MMETSP has become the default, to avoid this occurring naively. Note that when using the MMETSP reference, many more are "unclassified" at more specific taxonomic levels than when using PhyloDB.
Installation and testing
mamba
to install EUKulele
on the installation pageconda
?Software usage
PhyloDB
, which is currently hosted on Google Drive and has not been formally published) and (2) using Zenodo requires that an additional dependency, zenodo-get
, be included, which somewhat complicates the conda
install. However the newly-added database should work fine with Zenodo as of now.log
folder? Thanks again, and more soon!
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Again, thank you @jcmcnch and @will-rowe for your help!
As a follow-on: Writing (paper and readthedocs)
EUKulele
" that contains this warning explicitly saying that this is particularly important due to the presence of introns in eukaryotes. I have also added a link to this page of the documentation on the EUKulele
landing page, such that users arriving at the documentation can be immediately aware of where the caveats areEUKulele
, if a user was not interested in further invocation details. So we consider the Quick Start to be a separate resource from any tutorials or from the installation and about pages EUKulele
, I think that further specifics about YAML files beyond the beginner level are a little confusing to include in the main documentation. However, since users would need to create/modify a YAML file for the taxonomic cutoffs, I've clarified the documentation for thatAs far as the issue of providing metadata for each of the databases, for now I am writing a file with each invocation of EUKulele
that contains
I have just pushed a new release to both PyPI
and conda
containing all of the relevant code changes. Please let me know what you think of the various changes!
Hi @akrinos , thanks for your reply, this all looks great. I have just re-downloaded EUKulele using conda (and yes, with the BUSCO issue it was from a conda install but I will try again to make sure I get the same behaviour as before). I do have a few quick questions though:
EUKulele --version
returns 1.0.1 which is the same as before so I'm not sure I'm getting the changes you've implementedHi @jcmcnch, thanks so much for the feedback!
conda
side, but did on the pip
side. I will fix that! But also, the easiest way to check is either to run conda update
, or to check the Anaconda Cloud page. At the time that I'm writing this, our page indicates that 1.0.1 was updated ~2 days ago, so that's probably the easiest way to check the version correspondence...but this is my fault for not relabeling as 1.0.2 on conda
. mamba
section. The update on YAML and the introns were just 1-3 lines or so.Thanks again!
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Hi all. I hope everyone who had a break for new year had a good one.
Things are looking good here. @jcmcnch has now ticked all the required review boxes. If we could ask @johanneswerner to do the same, we can then mark this as provisionally accepted and start the ball rolling for publication.
@will-rowe I checked off the missing boxes - those were taken care off before already.
Perfect - thanks @johanneswerner and sorry for the box ticking exercise!
@whedon generate pdf
@whedon check references
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.7287/peerj.preprints.27295v1 is OK
- 10.1038/s41564-018-0176-9 is OK
- 10.1038/s41467-017-02342-1 is OK
- 10.1101/2020.06.30.180687 is OK
- 10.5281/zenodo.1476236 is OK
- 10.1142/S0219720012500151 is OK
- 10.1038/ncomms11257 is OK
- 10.1111/1755-0998.13147 is OK
- 10.1038/ismej.2015.30 is OK
- 10.1038/nrmicro.2016.160 is OK
- 10.1093/database/baaa051 is OK
- 10.1111/jpy.12529 is OK
- 10.1038/s41564-019-0502-x is OK
- 10.1016/j.tim.2018.10.009 is OK
- 10.1093/nar/gks1160 is OK
- 10.1016/j.tree.2014.03.006 is OK
- 10.1371/journal.pbio.2005849 is OK
- 10.1093/gigascience/giy158 is OK
- 10.1093/bioinformatics/btv351 is OK
- 10.17226/4901 is OK
- 10.1007/978-3-319-60156-4_18 is OK
- 10.1101/gr.229202 is OK
- 10.1016/j.gpb.2015.08.003 is OK
- 10.1038/sdata.2017.203 is OK
- 10.1093/bioinformatics/btw445 is OK
- 10.1038/nmeth.3176 is OK
- 10.1101/2020.06.30.180687 is OK
- 10.1371/journal.pbio.1001889 is OK
- 10.1371/journal.pone.0016342 is OK
- 10.1016/j.cub.2017.01.017 is OK
- 10.1038/ncomms12860 is OK
- 10.1111/gcb.12983 is OK
- 10.1098/rstb.2015.0331 is OK
- 10.1007/978-3-030-38281-0_12 is OK
- 10.1038/nature12221 is OK
- 10.1038/nmeth.4197 is OK
- 10.1128/AEM.01541-09 is OK
MISSING DOIs
- 10.1007/978-3-319-61510-3_4 may be a valid DOI for title: Functional analysis in metagenomics using MEGAN 6
INVALID DOIs
- https://doi.org/10.1093/nar/gkx1036 is INVALID because of 'https://doi.org/' prefix
- https://doi.org/10.1186/s13059-020-02014-2 is INVALID because of 'https://doi.org/' prefix
Hi @akrinos
Can you please check/fix those references. Looks like the MEGAN one might not need the _4
Once you have done this, please can you tag a new release and then archive it (with zenodo or similar). Then report back here with the DOI and version.
@whedon check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1093/nar/gkx1036 is OK
- 10.1186/s13059-020-02014-2 is OK
- 10.7287/peerj.preprints.27295v1 is OK
- 10.1038/s41564-018-0176-9 is OK
- 10.1038/s41467-017-02342-1 is OK
- 10.1101/2020.06.30.180687 is OK
- 10.5281/zenodo.1476236 is OK
- 10.1142/S0219720012500151 is OK
- 10.1038/ncomms11257 is OK
- 10.1111/1755-0998.13147 is OK
- 10.1038/ismej.2015.30 is OK
- 10.1038/nrmicro.2016.160 is OK
- 10.1093/database/baaa051 is OK
- 10.1111/jpy.12529 is OK
- 10.1038/s41564-019-0502-x is OK
- 10.1016/j.tim.2018.10.009 is OK
- 10.1093/nar/gks1160 is OK
- 10.1016/j.tree.2014.03.006 is OK
- 10.1371/journal.pbio.2005849 is OK
- 10.1093/gigascience/giy158 is OK
- 10.1007/978-3-319-61510-3_4 is OK
- 10.1093/bioinformatics/btv351 is OK
- 10.17226/4901 is OK
- 10.1007/978-3-319-60156-4_18 is OK
- 10.1101/gr.229202 is OK
- 10.1016/j.gpb.2015.08.003 is OK
- 10.1038/sdata.2017.203 is OK
- 10.1093/bioinformatics/btw445 is OK
- 10.1038/nmeth.3176 is OK
- 10.1101/2020.06.30.180687 is OK
- 10.1371/journal.pbio.1001889 is OK
- 10.1371/journal.pone.0016342 is OK
- 10.1016/j.cub.2017.01.017 is OK
- 10.1038/ncomms12860 is OK
- 10.1111/gcb.12983 is OK
- 10.1098/rstb.2015.0331 is OK
- 10.1007/978-3-030-38281-0_12 is OK
- 10.1038/nature12221 is OK
- 10.1038/nmeth.4197 is OK
- 10.1128/AEM.01541-09 is OK
MISSING DOIs
- None
INVALID DOIs
- None
Hi @will-rowe (and thank you @johanneswerner!) - I have fixed the DOI issues listed above, and published a release to Zenodo here, with DOI 10.5281/zenodo.4419894 for version 1.0.2, which is also fully updated on Anaconda Cloud and PyPI. I left in the _4 for MEGAN, as that is the most specific DOI for the paper. Thank you so much for your help!
Good work - thanks @akrinos. I'm afraid one more thing is needed from my end - can you make sure the zenodo release has an author list that matches the author list in your paper?
Hi @will-rowe, thanks and no problem! I couldn't figure out how to edit the author list before. I ended up having to modify it to be release 1.0.2b on Zenodo here; hopefully that's okay!
@whedon set 10.5281/zenodo.4422091 as archive
OK. 10.5281/zenodo.4422091 is the archive.
Submitting author: @akrinos (Arianna Krinos) Repository: https://github.com/AlexanderLabWHOI/EUKulele Version: v1.0.2b Editor: @will-rowe Reviewer: @johanneswerner, @jcmcnch Archive: 10.5281/zenodo.4422091
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@johanneswerner & @jcmcnch, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @will-rowe know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @johanneswerner
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @jcmcnch
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper