lazear / sage

Proteomics search & quantification so fast that it feels like magic
https://sage-docs.vercel.app
MIT License
201 stars 38 forks source link

Support for tims tof data #73

Open tomthun opened 1 year ago

tomthun commented 1 year ago

Hi Michael,

is there a plan to support tims tof data (.d format) in general and ion mobility?

Best,

Tom

lazear commented 1 year ago

I am interested in adding support for Bruker data/IM eventually... but I wouldn't say there is a plan yet. This represents a fairly large amount of work (writing a native .d parser, re-writing LFQ to support IM dimension, etc); and I have no personal Bruker or TIMS data in house, but more Thermo data than you can shake a stick at. As such, there is no timeline for when (or if) this feature might be added.

If anyone is interested in collaborating on this, please reach out.

tomthun commented 10 months ago

The new Sage documentation site is great! Thanks for the great overview. Also i am eager to here if there are updates regarding this issue. ;)

lazear commented 10 months ago

Thanks Tom.

Stay tuned, I hope to have a very exciting update for you soon

jspaezp commented 8 months ago

Hey @lazear! Along these lines ... I worked on a prototype to use IMS predictions during the prediction stage (branch diffs: https://github.com/lazear/sage/compare/master...jspaezp:sage:feature/ims_model), and even though the model gives very modest change in ID numbers (<1% most of the times, even if the model has R2 > 0.95); The branch also implements the ion mobility field in the spectra and its extraction from .mzml data.

let me know if you would like a PR that adds that to sage; either only the preservation of the IMS data or that in conjunction with the ion mobility model (which I would polish a bit and do some minor feature engineering).

I could also wait for the merge of the bruker branch/use that branch as a base so both .mzml and .d files provide the mobility data.

Best, Sebastian

lazear commented 8 months ago

@jspaezp this looks pretty good - let's wait to merge in the bruker branch first. I'm going to try and start reviewing it this week

tomthun commented 5 months ago

Hey, sorry to bother once again. Are the changes regarding the ims_model already merged? Thanks! :)

jspaezp commented 5 months ago

@tomthun not quite yet!

There is a pretty good discussion here >> https://github.com/lazear/sage/pull/98 << on where to go with the feature (long story short, the utility is less than I was expecting in most datasets) On the other hand, support for searching .d files is already implemented (but LFQ is not)

lazear commented 5 months ago

The ims_model feature has been merged and is included in the newly released v0.14.6!

tomthun commented 4 months ago

So .d directories are now natively supported right: https://github.com/lazear/sage/issues/117#issuecomment-1928514516? I just ran some quick tests on some test data and got the following:

(base) PS D:\Data\tools\SAGE> sage .\current_config.json [2024-02-09T11:03:06Z INFO sage] generated 111120583 fragments, 5992882 peptides in 7285ms [2024-02-09T11:03:06Z INFO sage] processing files 0 .. 1 thread 'main' panicked at C:\Users\runneradmin.cargo\registry\src\index.crates.io-6f17d22bba15001f\timsrust-0.2.0\src\file_readers\common\sql_reader.rs:30:62: called Result::unwrap() on an Err value: SqliteFailure(Error { code: Unknown, extended_code: 1 }, Some("incomplete input")) note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

I use the current_config.json

lazear commented 4 months ago

Is this ddaPASEF? The error suggests that this is an issue in Bruker's timsrust library. Can you please open an issue there and share your data with them?

https://github.com/MannLabs/timsrust/issues

tomthun commented 4 months ago

I tried another .d dataset with the same error. Hopefully someone will soon reply at #15.

lazear commented 4 months ago

I would strongly suggest sharing the actual file. It's going to be nigh-impossible for anyone to debug the issue otherwise.

tomthun commented 4 months ago

Updated with the data. Edit: Sorry, i had the wrong settings for the file, but should now work.