Closed jspaezp closed 1 year ago
Not that it is a bad thing to do (I do it in other libraries, specifically with Unimod and other controlled vocabularies in psims
), but both the in-memory and SQLite3-based Unimod interfaces let you pass the Unimod
type object at a path or file-like object containing the database's XML representation. If load-time is a concern, you can store a copy of the database locally and/or packaged with your other code.
In the case of the SQLite3-based interface, this is encouraged since it's all for the persistent on-disk representation.
I'd like to hear opinions: do we need to better document the possibility of using a local copy of Unimod, or is it not convenient enough?
Hey there!
Going back to the docs to see what I missed I realized 2 main things.
I missed the documentation where passing a path is possible to the unimod class instantiation. (in my defense, it is really deep between a lot or other text) here: https://pyteomics.readthedocs.io/en/latest/api/unimod.html#pyteomics.mass.unimod.Unimod
On the other hand, I was using it as part of the proforma parser, which uses the UnimodResolver
object, which internally loads an Unimod
object that does not take any arguments.
https://github.com/levitsky/pyteomics/blob/608ae01e111093b36f8ab22c8d0aefcb74afc01a/pyteomics/proforma.py#L401-L408
so ... Would I need to subclass the resolver, over-write the load_database
method so the instance of the unimod object reads from disk? (this feels overly complicated).
On the other hand, I am having a hard time thinking of an use case where one would not want the database to be available in disk instead of downloading it in every instantiation.
So I guess, ... to summarize it ...
let me know what you think! as always thank you very much for the help and reply!
Yes, the ProForma
interface doesn't support overriding the location. It uses psims
for everything but Unimod
. The workaround currently would be to do something like this:
from pyteomics import mass, proforma
unimod = mass.Unimod("path/to/unimod.xml")
proforma.UnimodModification.resolver._database = unimod
If we used the psims
implementation of Unimod, everything would route through psims.controlled_vocabulary.obo_cache
which you could enable at the start of your script (which if start-up time is a concern and you're using the other ontologies, you might wish to do anyway).
To keep using mass.Unimod
, we can add a proforma.set_unimod_path
method which will do exactly what that snippet above does.
Ohhh I see!! Thank you very much for the code snippet of the workaround!
I do not believe I totally understand what yo mean with the psims implementation of unimod (I am still becoming familiar with the API).
Regardless, Thank you very much for the help so far! Best, Sebastian
It looks to me like we need a documented method to instantiate the resolver with a local copy (item 3 from Sebastian's comment), as well as better documentation (item 2).
As for item 1, I'm not sure how much sense it makes to not ship a bundled copy of unimod, but I always tended to keep "strict" dependencies to a minimum, to allow the installation to be as slim as possible. This is obviously a matter of trade-off, and we can include a unimod XML file, but it doesn't mean we should discard the functionality allowing to specify an alternative location.
Thanks to @mobiusklein, #85 implements the API for working with a local copy of Unimod when parsing ProForma: https://pyteomics.readthedocs.io/en/latest/api/proforma.html#cv-disk-caching
Personally I feel that this solves the core issue. @jspaezp what do you think?
Yes i believe it does! This is an amazing addition, thank you very much
I have noticed that when using the unimod interface, the default is to download the database. I would like to propose to include the database with the release.
It would greatly improve the speed in using the functionality.
let me know what you think! best Sebastian