Closed Robaina closed 1 year ago
Thank you @Robaina for submitting Pynteny to pyOpenSci. As you noted, we have already discussed on #65 and you have done extensive development to prepare for submission.
Here are the initial editor checks:
YAML
header of the issue (located at the top of the issue template).
The author has done a ton of work to get ready for submission. Pynteny is looking really good. We will need a contributing.md before the review finishes but that's the only thing I really see missing right now.
Thank you again @Robaina.
We will get an editor and reviewers as soon as possible.
I need to confirm timelines with @lwasser but I would expect it might take a bit longer because of the holidays, my guess is two weeks.
Will let you know for sure by the end of this week, but wanted to make sure you knew we have seen your submission and we are beginning the review.
Hi @NickleDave,
thanks for accepting the submission! I have included a "CONTRIBUTING.md" in the root directory of the repo, also moved instructions on how to set a developer environment to that file. Looking forward to the review process -- Btw, I think I did fill out the initial onboarding survey, please confirm.
Best, Semidán
Perfect, thank you Semidán!
Following up on my last message: we do expect to get review started within the next two weeks. People are off-line because of the holidays but we will get the ball rolling now.
Dear @Robaina, nice to meet you!
I just wanted to say hi 👋🏼 and inform you that I am pleased to be the editor for this review. I see that you and @NickleDave already did great work during the pre-submission phase! We are recruiting the reviewers now, and I will update you once they are onboard.
Hi @arianesasso, nice to meet you too!
Awesome. Thanks for letting me know and for being the editor. Looking forward to the review! (und guten Rutsch ins neue Jahr)
Dear Robaina,
I hope you had a great start to the year as well! I am sorry for the delay. We are still looking for our second reviewer.
However, I would like to introduce you to @Batalex, who is already on the job as the first reviewer 😄 . We have previously talked about the package, and he already has access to the reviewer's guide, the reviewer template, and the Python packaging standards detailed in the packaging guide. So, we will be able to continue moving 🚀 .
And please feel free to ping me at any time if you need help or have questions!
Hi @arianesasso,
no problem, thanks for the update :) Hi @Batalex, thanks the review!
Hey there! I am almost done with my review. No promises, but I think I can deliver this weekend. Looking forward to discussing it with you! 🙌
Here comes a big one! @Robaina @arianesasso
I want to thank both the author and the editors from pyOpenSci for the opportunity to tackle this package review. The following comments reflect my experience as I maintain private packages, and all remarks come with pynteny's best interest at heart.
Some comments are regrouped at a "general" level if they concern the project structure or several files. If I encountered a situation that I had already pointed out on another file, I did not comment again. However, I encourage taking into account the remarks from one file to the others.
Several comments I made are only valid for the submitted version (0.0.5) and were fixed in the meantime. Nonetheless, I will mark off what has been done later.
With that being said, let us get started!
I used the black cat🐈⬛ to highlight my personal thoughts in the review template below.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
🐈⬛ The submitted version uses a personal conda channel in README, which is something I would not recommend. However, it uses bioconda in the main branch, so it is all good.
🐈⬛ There are some improvements to be made regarding running the examples locally, see below.
🐈⬛ In Pynteny 0.0.5, contributions guidelines are included in the README file. A more detailed CONTRIBUTING file has been added since then.
pyproject.toml
file or elsewhere.🐈⬛ I noticed that two of the three links to the Pynteny repo use the HTTP scheme, and the third uses HTTPS.
Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
🐈⬛ Additional badges have been added since the 0.0.5 release.
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
🐈⬛ I ran the package in WSL since Apple silicon is not supported. The installation went smoothly, except for the really long resolution time. I am glad to see that the main branch encourages to use mamba.
🐈⬛ As far as I can tell, it seems to do what it claims.
🐈⬛ NA, no performance claim to be found in the documentation.
🐈⬛ Tests covers 45% of pynteny. The coverage is heterogenous: the application and CLI parts of the package are less tested than the rest.
🐈⬛ Unit tests did not run on CI as of 0.0.5. This has changed since then. The documentation website is built and deployed using CI, nice!
🐈⬛ The code base does not fully follow PEP 8. See below for more information.
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: ~8 hours
snake_case
to name functions and methods rather than camelCase
.editorconfig
file. There again, this is a simple way to enforce consistency in the code base by defining a few options to control the files’ format. This is especially true for Python projects since mixing tabs and spaces would crash the program. There are other benefits as well, e.g., removing trailing spaces and adding new lines at the end of files.argparse.ArgumentParser
instances to CommandArgs
before using themThe defaults paths are quite inconvenient if I run the cli from the installed package rather than the source directory, the config file as well as the downloaded database ends up in /home/<user>/miniconda3/envs/Pynteny/lib/python3.10/site-packages/
I recommend adding "Installation" & "Getting Started" sections in the documentation. The rationale is as follows: any part of the project may be a potential user's first contact. This includes the README file, the package description on Pypi (not applicable here) / a conda channel, and the documentation.
I would advise adding instructions on how to locally build the documentation in CONTRIBUTING.md in the section 3
The following mention is incorrect in the example notebook:
To follow this example, you don't need to download _E. coli's_ genome, since it has been already downloaded during Pynteny's installation.
The MG1655.gb file is stored in the tests
folder, which is not included in the package. Therefore, when the package is installed using conda, it is not downloaded.
import *
are usually frowned upon in Python, but IMO this is ok in this caseimportlib.metadata
is only available in the standard library from 3.8 and onwards (3.7 is technically allowed in 0.0.5). This is partially fixed in the main
branch. For good measure, I would increase the minimum Python version in the pyproject.toml
file as well, even if poetry is only used as a build backend.Command
class to avoid a linter warning in the update
method:class Command():
"""Parent class for Pynteny command
"""
_args: CommandArgs
__init__
methods are not necessary, the smaller the code base the better!_repr_html_
and cite
None
default value are correctly typed as optional
Path(__file__).parent
in another Path()
in the getTestDataPath
method since it returns a Path
object as well. This comment applies to utils.py as well.args
is not used in the Pynteny.app
method, so there is no need to parse it.getHelpStr
could be simpler by writing to an io.StringIO
instead of a temp file.required
and pynteny
assignmentspecified
hmm_order_dict
is not needed in SyntenyPatternFilters
init method.strand_map = dict(neg=-1, pos=1) # could be set as a module-wide constant
self.strand_order_pattern = [strand_map.get(strand, 0) for strand in parsed_structure["strands"]]
contains_hmm_pattern
, contains_distance_pattern
, and contains_strand_pattern
could use Python’s idioms and return boolean values instead of integers, this is not an issue with their use in pandas rolling method. Boolean values can be aggregated numerically in pandas. This would simplify those methods. In contains_strand_pattern
, there is a True
equality comparison.inplace=True
in pandas operation is not recommended and could be deprecated: https://github.com/pandas-dev/pandas/issues/16529.size
method instead of a filter with a lambda in getAllHMMhits
, but this is a matter of preference.filterHitsBySyntenyStructure
could be made on a pandas’ groupby object. This is not a prominent feature, but one may do the following:for group_key, group_df in df.groupby(key):
pass
len(contig_hits.hmm.unique())
could be simplified to contig_hits.hmm.nunique()
iterrows
is quite inefficient if one only need to loop over the index. The author may use for i in matched_rows.index
getSyntenyHits
, a public attribute would work just fine.itertuples
instead of iterrows
in addHMMmetaInfoToHits
. itertuples
is much faster if you need to iterate over data frames one row at a timeisinstance
rather than type equality in filterFASTAbySyntenyStructure
. isinstance
is a more robust (and marginally faster) solution because it allows nominal subtyping: any subtype of str
or list
would pass the test conditions. Using isinstance
is therefore considered more pythonic than type equality./
or Path.joinpath
instead of os.path.join
usecols
argument for pd.read_csv
try-except
block needed in getHMMnamesByGeneSymbol
? (+ bare except is considered a code smell). The code would work without this. The same goes for getHMMgeneID
, with a little twist to return None
, and for getMetaInfoForHMM
preprocessing.py
does isLegitSequence = RecordSequence.isLegitPeptideSequence
try-except
block in LabelParser.parse
. I do not know what could be failing (a split
, an int cast?), and the exception caught is too broad. Tip: I suggest extracting meta.split("_")
in a variable since it appears in four lines in a rowtuple[str]
means “a tuple of exactly one str.” This means that splitStrandFromLocus
is incorrectly typed_input_file_str
a read-only property.unique_records
was defined outside the removeDuplicates
method. I understand that the purpose is to generate the records lazily, which could still work as intended.number_prodigal_record_fields
could be a global constantfromGenBankData
, I think the three inner functions would be better off defined outside the method. Contrary to the previous time, they do not use self
and are even easier to extract. This change would reduce fromGenBankData
complexity and make it even easier to debug.GeneAnnotator
if args.logfile is None:
args.logfile = Path(os.devnull)
elif not Path(args.logfile.parent).exists():
Path(args.logfile.parent).mkdir(parents=True)
logging.basicConfig(format='%(asctime)s | %(levelname)s: %(message)s',
handlers=[
logging.FileHandler(args.logfile),
logging.StreamHandler(sys.stdout)
],
level=logging.NOTSET)
logger = logging.getLogger(__name__)
In addition, this would limit the complexity of synteny_search
wget
is quite old and does not seem to be active. How about replacing it with a more maintained alternative? e.g. httpx, requests~/.pynteny
Hey @Batalex ,
thanks so much for your very detailed review! I quickly run through your comments and I see that they will be very useful. I'll address them carefully in the following days. But wanted to thank you now.
@Batalex, what a beautiful and thoughtful review! I really enjoyed the 🐈⬛ haha and, of course, all the meaningful recommendations you made! Thank you for all your effort 🙏🏼.
@Robaina, hopefully, we will have the second reviewer onboarded soon. I will keep you posted!
Hi @Robaina, I want to introduce you to our second reviewer @c-thoben. She will be starting her first review with us 🥳 . Keep tuned for more updates soon!
Hi @arianesasso, great, thanks for letting me know. Hi @c-thoben, thanks for offering to review!
Hey @Robaina, if you would like, I can free some time in case you want a reviewer for your upcoming PRs on Pynteny. This may be a little easier to discuss the suggested changes.
Hi @Batalex ,
thanks for the offer! Will do that for the ones I'm currently working on.
Great job @Robaina! Thank you also @Batalex, for offering more time to help with the PRs 🙏🏼
We are onboarding @c-thoben and soon you will have your second review 🥳 .
Hi @Batalex,
thanks again for your review. I noticed that some of your comments were not addressed in the previous PRs. I'll clarify some of them here (preparing a PR for the remaining):
There are a few discrepancies in types, some of them could be fixed by converting argparse.ArgumentParser instances to CommandArgs before using them
I understand what you mean here: Subcommand functions in pynteny.subcommands
currently accept both an argparse.ArgumentParser
object or a pynteny.utils.CommandArgs
object. This flexible type in the argument was made on purpose so subcommand could be called both from the CLI and from within Python. I don't understand why this is undesirable. I could, like you pointed out transform argparse objects to CommandArgs, but find it unnecessary (unless there is a good reason behind that I'm missing).
Additionally, type hints in pynteny.subcommands
could be fixed using typing's union
What do you think?
The defaults paths are quite inconvenient if I run the cli from the installed package rather than the source directory, the config file as well as the downloaded database ends up in /home//miniconda3/envs/Pynteny/lib/python3.10/site-packages/
This has been changed already in this PR.
Since the author is using pathlib, they might use / or Path.joinpath instead of os.path.join
This was changed in this PR
I have not profiled the code to see if there is a performance bottleneck there, but I would use itertuples instead of iterrows in addHMMmetaInfoToHits. itertuples is much faster if you need to iterate over data frames one row at a time
This was changed in this PR
(Will get rid of getattr
in the next PR)
My personal recommendation is to define the text encoding whenever a text file descriptor is opened. Even if this is not feasible at the moment, this is a tiny step toward Windows compatibility, and it removes a little bit of uncertainty.
Alright, will add that in the new PR.
The MG1655.gb file is stored in the tests folder, which is not included in the package. Therefore, when the package is installed using conda, it is not downloaded.
Right, will modify this in the new PR.
Thanks, @Robaina; I needed to make a second pass at my comment, but you have already taken care of it!
Aside from the remaining getattr
to take care of, I am pleased to see that you found interest in my comments.
While we wait for the second review, can I make some PR to add a few quality tools to ease the maintenance in the future?
I am thinking of:
Sure @Batalex, go ahead with this(these) PR(s). I'll be glad to help. These changes should make the codebase very robust and pretty.
Here's my review! @Robaina @arianesasso
Thanks for onboarding me :) I am not that experienced, but I enjoyed looking into the review process. You did some really nice work @Robaina and I hope my comments are helpful.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
Readme file requirements The package meets the readme requirements below:
[x] Package has a README.md file in the root directory.The README should include, from top to bottom:
[x] The package name
[x] Badges for:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole.Package structure should follow general community best-practices. In general please consider whether:
The package contains a paper.md matching JOSS's requirements with:
*Estimated hours spent reviewing: 10 h
Thank you so much for your review @c-thoben! I really appreciate the time you have taken to do this review.
Find below my answers to your comments:
- When trying to install the software, I had some issues with anaconda resolving the package dependencies if i used the recommended command "conda install -c pynteny". I got some help from the community and with "conda create -n pynteny -c conda-forge -c bioconda python pynteny" the installation worked fine. To test Pynteny, I used a VM with Ubuntu 22.04.1 LTS. The anaconda version was 23.1.10 and python3 inside the pynteny environment 3.10.6. If you need furhter information I am happy to provide them.
The recommended way to install pynteny is via conda, in its own environment:
conda create -n pynteny
conda activate pynteny
conda install -c bioconda pynteny
as indicated in the README. It looks like this is what you did at the end. Python should be included in the environment by default.
I have to update the package in bioconda. There have been a lot of changes in pynteny's codebase since then, but did not want to bother the bioconda community with further updates until this review was over.
The new features can be tested installing the developers version, following instructions in the contributing file.
- I have seen that if the PGAP database is downloaded, it is marked in the configuration file and the download is skipped at future exections. This can lead to problems if e.g. the user has deleted the database locally and wants to redownload it. Maybe a force option can be helpful.
I had the exact same thought. This feature is already included in the newer version of Pynteny (not yet in bioconda).
- All tests passed for me. However, I have seen that in the scripts "test_database_build.py" and "test_integration_search.py", the unittest main is not called so the tests are not executed if the module is run locally. Is this right?
Right, I missed that. Thanks! It's been changed.
- I enjoyed the quotes in the CLI :)
I'm glad you enjoyed the quotes :D
Hey @Robaina, thanks for your answer! It seems that most of my comments are already resolved. Onto the installation issue. I had the installation problems when I followed the recommended way in the README:
conda create -n pynteny
conda activate pynteny
conda install -c bioconda pynteny
The error I got is due to conda not being able to resolve the dependencies, I am also able to reproduce it. The user who helped me had the same incompatibility on fedora37 shipping python3.11 with conda environments with python 3.10, 3.9 and 3.8. She found out that the installation works if the dependencies are resolved by conda at the same time:
conda create -n pynteny
conda activate pynteny
conda install -c conda-forge -c bioconda python pynteny
If you need more informations I am happy to provide them!
Hi @c-thoben,
thank you so much for spotting and clarifying this issue. I have added some lines to README to point out this issue: try your solution and also install with mamba instead of conda (which is usually much faster to resolve dependencies). This should be enough for now. However, I'm thinking of further restricting dependency versions in the next update on bioconda, this should also help reduce installation time / solving dependencies.
@Robaina i'm using a pure mamba forge installation and this install is not working for me on a MAC in a clean environment. something is up with the dependencies that i think needs to be resolved in a way that users can understand.
@c-thoben had a difficult time installing on linux as well and her points need to be addressed!
can you explain a few things to us?
why do you need to specify both conda-forge and bioconda channels when installing? are there some specific dependencies that are getting in the way?
Pynteny can be install with the following command:
conda install -c bioconda pynteny
If conda takes a long time to solve the dependencies (e.g., more than a few minutes) or fails to solve it, you may try:
conda install -c conda-forge -c bioconda python pynteny
Or install it with [mamba](https://github.com/mamba-org/mamba) instead of conda:
mamba install -c bioconda pynteny
finally there is a personal channel in your environment here. . we want to ensure that authors are in fact using public channels. can please explain why a personal channel is listed in the environment?
Many thanks. It's critical that you address this installation issue as so many of us are running into confusion installing this package. While i understand the complexity of dealing with specific channels and complex installs, our goal here it so support usability in all of our vetted packages.
Hi @lwasser,
thanks for your comments.
@Robaina i'm using a pure mamba forge installation and this install is not working for me on a MAC in a clean environment. something is up with the dependencies that i think needs to be resolved in a way that users can understand.
I agree. For what @c-thoben and you tell me, it seems like something is wrong with the package (now outdated) version in bioconda. Pynteny is built and installed without problems in the tests GitHub Action, both in Ubuntu and Mac.
can you explain a few things to us?
- conda install -c conda-forge -c bioconda python pynteny
why do you need to specify both conda-forge and bioconda channels when installing? are there some specific dependencies that are getting in the way?
bioconda channel needed for pynteny, conda-forge channel needed for python (solution suggested by @c-thoben). This was meant as a provisional fix until the new version was ready in bioconda. I have no trouble installing pynteny from bioconda in my Ubuntu 20.04 machine, so really could not know that this fix wouldn't work in other settings.
- the instructions in the readme are confusing because there are now 3 installation options as follows: there should be a single way to install a package that works.
Pynteny can be install with the following command: conda install -c bioconda pynteny If conda takes a long time to solve the dependencies (e.g., more than a few minutes) or fails to solve it, you may try: conda install -c conda-forge -c bioconda python pynteny Or install it with [mamba](https://github.com/mamba-org/mamba) instead of conda: mamba install -c bioconda pynteny
I agree. Ideally only conda install -c bioconda pynteny
, perhaps suggesting usage of mamba only to accelerate installation. Will update pynteny on bioconda and try installing it in the GitHub Action (Ubuntu and Mac) directly from bioconda once updated as a way to check installation on MacOS (I don't have access to a Mac machine)
finally there is a personal channel in your environment here. . we want to ensure that authors are in fact using public channels. can please explain why a personal channel is listed in the environment?
Thanks for spotting this. This channel was a remanent of a previous version in which pynteny was hosted in my personal conda channel. It was requested for me to upload pynteny to bioconda prior to review. Already removed.
Many thanks. It's critical that you address this installation issue as so many of us are running into confusion installing this package. While i understand the complexity of dealing with specific channels and complex installs, our goal here it so support usability in all of our vetted packages.
Of course. I totally agree. Thanks for raising these issues. I will update pynteny in bioconda, test installation again, and come back to you all.
@lwasser
Sorry, I just realized. There is still one dependency, Streamlit-Aggrid, that is hosted in my conda channel. That's why my channel was still listed in the environment.
This is a non-critical dependency, it is employed to make an interactive results table in the web application. While installable from pip, I had to upload it to conda to be able to list it as a dependency in bioconda (pip dependencies not allowed).
Since this dependency is not hosted in a public channel, I will first try to contact Streamlit-Aggrid developer to suggest uploading it to bioconda, if unsuccessful, I will remove this dependency altogether and make a static results table in the web application.
no worries @Robaina -Many thanks for your prompt reply. i know that this review has been a LOT of work and you've been incredibly committed throughout the process! I just want us to figure out the install items and we can do that together if need be! just to be sure i've pinged a colleague who works on conda-forge to help us sort this out as well. but it ALSO sounds like part of this issue are things you've fixed that haven't been released yet which is very understandable.
i have seen packages (such as geopandas) that did have a dependency that you had to install manually. but i think it's not ideal for users. perhaps there is some way for you to provide instructions for using the interactive approach (if you wish) if you can't find a way to have this on conda.
however, If the package is on pypi already, it's actually quite simple to move it to conda forge using greyskull. please let us know if you need resources to share with the authors. i think this is quite doable and might then be less work for you assuming the authors are ok with it. then every time they deploy to pypi they also merge a pr on conda forge a few hours later. Then it's in both places. i've done this with several packages and it's easy to maintain.
Thank you for the support, @lwasser. I'm sure we will figure this out.
Alright, so I went to Streamlit-Aggrid's repo and found out that I had already contacted the authors in October 2022 (so many things going on, I totally forgot). Would be nice if we could set the conda-forge pipeline if the author agrees (contacted him again). I let you all know.
ok awesome! yup, we will figure this out! i'll follow your issue referenced above as well. it looks like the maintainers have a lot of issues and not enough time to address them all which is very understandable and common in this space!
Hi @lwasser,
I've been thinking about removing the Streamlit web application from pynteny and making it a separate pip-installable package pynteny-app
which would contain this functionality. This package would work as an optional add-on to pynteny (which would not be submitted to PyOpenSci)
This change would solve the issue with Streamlit-Aggrid, and probably also help solve dependencies by conda. All the functionality would still be accessible via CLI and within Python as a module (which are the main ways to access pynteny, anyway)
Would this change be acceptable with respect to the status of the current submission?
Thanks!
@Robaina i think so. but let's check in with @arianesasso and our reviewers @Batalex @c-thoben so they are aware of the update. i see that this dependency is really not core to your functionality so it could be a nice add on as you mention and would streamline your installation significantly! Ari, Alex and Corinna - does this seem reasonable? i don't want to take over this review i just want to help ensure things are installable. many thanks. i can stop complicatin gthe review once we have decided :)
hey @Robaina it looks like that package may get added to conda-forge by a maintainer. i can keep you posted here if that will save you some time.
it actually was already there - https://anaconda.org/conda-forge/streamlit-aggrid @Robaina can you confirm that this is what you need? if so it should just be a simple environment update to make everything work.
Yes, awesome, thank you @lwasser !
Hi @arianesasso, @Batalex, @c-thoben and @lwasser,
after testing the new packages Streamlit and Streamlit-Aggrid in conda-forge, I am now facing a new issue with Streamlit, basically this one.
While I like the Streamlit app, I would like to prioritize other issues regarding core functionality of pynteny (e.g. adding support for the PFAM database). Therefore, I am now considering the idea previously commented of removing the web interface from pynteny and instead create a new addon pip package containing it.
Please, confirm that this would be an acceptable change to this submission. Here is the branch without the app.
Thanks for all the help :)
On the one hand, I think that the streamlit app is orthogonal to the package itself. I mean that removing it does not change the package's "elevator pitch."
On the other hand, the web app is mentioned in the paper.md
(https://github.com/Robaina/Pynteny/blob/main/ms/paper.md) submitted to JOSS.
I would be +1 on moving it to another package, if it means that a larger pool of users can use the core package, but I wonder if it could be an issue with JOSS.
Thanks @Batalex or maybe just removing it from the JOSS paper or mentioning it as a plugin / add on? it sounds like this won't impact our review at all - i think we can just retest installation once all of this is added on our end. And then on the joss end you can always edit the paper to clarify that component.
Great, I agree. I will edit the paper. I should be fine since it is not even submitted yet. I'll let you know once I test installation after these new changes.
Hi there @arianesasso, @c-thoben @Batalex ,
I have updated pynteny on bioconda, and tested installation directly from bioconda within a GitHub Action (under ubuntu-lastest and macos-latest).
I also updated setup instructions in the README, and modified python version, since now pynteny depends on python 3.10.
Please, could you verify that you can install pynteny in your settings? (perhaps @lwasser could also try installing it in MacOS?)
Thanks!
sure thing i'll try to install on macOS @c-thoben can you please try to install again too following the new readme instructions? @Robaina thank you for your dedication and promptness in getting all of these things sorted out and fixed! it's much appreciated.
I am not able to install on MacOS (M2)
> mamba create -n pynteny -c bioconda -c conda-forge python=3.10 pynteny
Looking for: ['python=3.10', 'pynteny']
bioconda/osx-arm64 129.0 B @ 296.0 B/s 0.4s
bioconda/noarch @ 2.3MB/s 2.0s
conda-forge/noarch @ 4.2MB/s 3.2s
conda-forge/osx-arm64 @ 1.7MB/s 3.5s
Could not solve for environment specs
Encountered problems while solving:
- nothing provides requested pynteny
The environment can't be solved, aborting the operation
i can confirm the same issue and response on my mac!
whats odd is i think the error is it can't find pynteny on bioconda :
PackagesNotFoundError: The following packages are not available from current channels:
- pynteny
but it is there!
@Robaina do you need help with this? i think we can get you some help. can i invite you to our slack? we can talk there? if so please email me at leah @ pyopensci.org and i'll send you an invite so we can try to figure this out together.
Sure thing @lwasser, will send you an email for that.
I think the issue might be related to the "new" ARM64 processor architecture in MAC. The machine in which @Batalex tried to install pynteny seems to have one (and all the new apple machines starting November 2020, I think).
Several dependencies of pynteny don't provide support for ARM64, e.g. hmmer, prodigal, pyfastx. Thus, pynteny itself doesn't have support for ARM64 architectures as of now. (To compare, here is another dependency psutil which does have ARM64 support on bioconda).
If that's the case, there is a workaround, basically telling conda to construct an environment using x86 architecture on ARM64 machines to install dependencies not yet supporting ARM64. Here is a blog post with instructions, for what I read, something like:
CONDA_SUBDIR=osx-64 conda create -n pynteny_x86 python=3.10
conda activate pynteny_x86
conda config --env --set subdir osx-64
and then, I guess:
(pynteny_x86) conda install -c bioconda pynteny
That's something new I learned today... I have no way to test if this workaround will be sufficient, though. Will send you the email now, anyway.
Thanks!
@Robaina Thank you for going the extra mile and making all those changes!
The joys of novelty 😬
@Robaina I am able to install Pynteny with the README instructions on my Linux VM and also on MacOS without problems. As I have a 2018 MacBook, this may support the ARM64 processor theory. Thanks for looking into this issue @Robaina and thanks for your support @lwasser!
Awesome! then perhaps I could add that workaround solution to the setup instructions for the people working on ARM64 machines.
Ok i just installed in my M1 - the sequence below worked!
CONDA_SUBDIR=osx-64 conda create -n pynteny_x86 python=3.10
conda activate pynteny_x86
conda config --env --set subdir osx-64
conda install -c bioconda pynteny
it did take it a moment to import into python but it did import successfully NOTE: i talked with a conda forge maintainer about this - IF those three packages on bioconda could be also added to conda-forge then they can actually be added to a list where a bot tries to build the appropriate arch wheels (arm64 for osx in this case). of course there might be some reason that they aren't already there so this could be a non trivial task. but i think for our purposes if the package installs on everyone's computer who is here in this review, i think we'd done due diligence. and all that is needed is a documentation update for how to install.
Excellent. Thanks for consulting this with conda-forge @lwasser . I have added these extra lines to README and also the documentation. This issue can always be updated in future releases.
Thanks so much everyone for the review!
Submitting Author: Semidán Robaina (@Robaina) All current maintainers: @Robaina Package Name: Pynteny One-Line Description of Package: Query sequence database by HMMs arranged in predefined synteny structure Repository Link: https://github.com/Robaina/Pynteny Version submitted: v0.0.5 Editor: @arianesasso Reviewer 1: @Batalex Reviewer 2: @c-thoben
Archive: JOSS DOI: Version accepted: 1.0.0 Date accepted (month-day-year): 03-10-2023
Description
Pynteny is Python tool to search for synteny blocks in (prokaryotic) sequence data through HMMs of the ORFs of interest and HMMER. By leveraging genomic context information, Pynteny can be employed to decrease the uncertainty of functional annotation of unlabelled sequence data due to the effect of paralogs. Pynteny can be accessed (i) through the command line, (ii) as a Python module, or (iii) as a (locally served) web application.
Scope
For all submissions, explain how and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
Pynteny's main objective is to provide a means to query NGS (unannotated) sequence databases, such as metagenomic/metatranscriptomic datasets using syntenic blocks (i.e. spatial arrangements of genes) rather than single target genes/protein domains. In this sense, I would classify Pynteny within Data Extraction.
Pynteny was designed to be used by researchers working with large, unannotated sequence databases, such as those typically encountered in metagenomic analyses. It can be accessed through a command line interface or easily integrated into pipelines as a Python package. Pynteny can also be used through a graphical interface running locally in the browser, which is more suitable for educational purposes.
To the extent of my knowledge, there isn't any Python package that provides the functionality provided by Pynteny.
@tag
the editor you contacted: https://github.com/pyOpenSci/software-review/issues/65Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
Publication options
I had submitted this package for publication at JOSS prior to pyOpenSci. The submission is currently under consideration for scope in this issue: https://github.com/openjournals/joss-reviews/issues/4978
JOSS Checks
- [x] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. - [x] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. - [x] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. - [x] The package is deposited in a long-term repository with the DOI: [![DOI](https://joss.theoj.org/papers/10.21105/joss.05289/status.svg)](https://doi.org/10.21105/joss.05289) *Note: Do not submit your package separately to JOSS*Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text-based review. It will also allow you to demonstrate addressing the issue via PR links.
Code of conduct
Please fill out our survey
P.S. *Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
[Editor and review templates can be found here]: https://www.pyopensci.org/software-peer-review/how-to/editors-guide.html and https://www.pyopensci.org/software-peer-review/how-to/reviewer-guide.html