mobiusklein / glycresoft

An LC-MS/MS glycan and glycopeptide search engine
https://mobiusklein.github.io/glycresoft/
Apache License 2.0
8 stars 8 forks source link

Cannot install all dependencies in python 2.7 virtual env #13

Closed ksachikonye closed 2 years ago

ksachikonye commented 2 years ago

I seem to be running into a problem with installing the software. I tried using both make and pip to install all plus those in external-requirements.txt and I still cannot install it. Is it possible for me to get a standalone executable for use in python 3.9 ?

mobiusklein commented 2 years ago

Could you share the error you're getting, please?

While the codebase is mostly Python 3 compatible, I haven't tested building a PyInstaller binary with it yet. Which OS are you using?

ksachikonye commented 2 years ago

make: *** [makefile:31: install-dependencies] Error 1 The make installation process is not working. I then tried installating via requirements.txt I need to get Microsoft Visual C++ 9.0 ( which I can only get from this [http://aka.ms/vcpython27] which is no longer available for download. All the other packages are being installed with no issue except for ms_peak_picker. I have used ms_peak_picker before in other packages with no issues and I do not understand why the installation is not working in this instance. When I try to install PyInstaller, there is suddenly an error in line 63 File "setup.py", line 63 file=sys.stderr) ^ SyntaxError: invalid syntax

from distutils.command.build_ext import build_ext, build_ext doesn't exist

mobiusklein commented 2 years ago

Okay. Can you tell me which version of Python you got that syntax error on, and can you tell which OS you want the pre-built program for please? I would need to know which OS to compile it against. I also have a Docker container if that works for you as well.

ksachikonye commented 2 years ago

I am using Python 3.9 and Windows Home 10. It was way easier for me to install glypy and this begs the question: I am trying to make a multiomics mass spectrometry processing pipeline. I realised that if all I need to do is to build a database structure as shown in your glypy examples and just search against that database. That still limits my pipeline to to glycomics and I wanted to know the best way to integrate glypeptide. How would one go about searching against a database when they have two different molecules (peptides and glycans)..how does glypeptide work ? Have you also thought about including clinical and other metadata information to your pipeline ;) Lets close this and resort to emails if that works for you

mobiusklein commented 2 years ago

Apologies for the long delay in response. I've been trying to check each of my dependencies for missing wheels for your platform plus version. I think I have a wheel for ms_peak_picker on PyPI for Py3.9 64-bit on Windows. Are you getting the error message from PyInstaller itself, or during the installation of a particular library?

Attempting to write a universal search engine is pretty tricky because you both over-complicate your program and you may find that shared data structures become harder and harder to change. My approach was to create completely separate search algorithms and dispatch to them depending upon the requested problem (search-glycans vs. search-glycopeptides). This makes my codebase pretty hairy, and leaves a lot of dead code lying around.

As to how glycopeptides work, that's what a lot of the code here is dealing with. I represent a glycopeptide as a union of one peptide with one or more glycans at specific sites on disk, but in memory they are complex entities of their own. See glycopeptidepy for more of the behaviors there. Glycopeptides fragment differently depending upon the instrument method being used. The search algorithm chosen needs to be appropriate to the type of fragmentation expected. If you have CID or HCD, you need to look for peptide b/y ions with and without the reducing end monosaccharide(s), and hopefully the intact peptide sequence with the attached glycan(s) undergoing neutral losses as well. With ExD methods, you need to look for peptide c/z ions with and without the glycan(s) either intact or partially dissociated too. EThcD mixes all of that together. Depending upon the method, you might favor scoring some fragments over others, leading to your algorithm further branching and specializing. Efficiently handling these features is the neat bit. Happy to discuss this more over email.

As for storing additional metadata, there's a facility for it already built into the underlying data model, but it's of limited use to the average user, so I haven't invested in exposing it. What kinds of information do you want to store? Right now from the CLI, I have an extra tool to add information to an analysis:

glycresoft tools update-analysis-parameters path/to/analysis.db 1 -p "Subject ID" "MY_FANCY_INTERNAL_ID"

But this isn't actively read anywhere else in the program, it'd just be for other software reading that file.

mobiusklein commented 2 years ago

The codebase is now Py3 only as it starts to incorporate type annotation syntax.