vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
283 stars 53 forks source link

Speclib format #1227

Open heejongkim opened 4 weeks ago

heejongkim commented 4 weeks ago

Hello,

Thanks for sharing such an amazing tool to the community. I've been enjoying to utilize the tool so much.

To get to the point, I wonder if there's a way I can get the speclib binary schematic or format (I remember reading that it's C++ struct) so that I can parse, merge, and write without needing to convert speclib to tsv via DiaNN? If I have enough hints, I will be more than happy as I can easily inspect the library and modify as needed, such as appending a couple of extra predicted ones to accommodate the experimental design.

Thank you.

best, heejong

vdemichev commented 4 weeks ago

Hi Heejong,

The .speclib format continuously evolves. Therefore I would recommend to convert to .parquet (since 1.9.1) and then you can edit the library easily in R or Python.

Best, Vadim

heejongkim commented 4 weeks ago

Hi Vadim,

Thanks for your guidance. For that, I have a few questions to clarify as I'm primarily using linux command line so I would like to explicitly know what parameters I should utilize.

  1. How can I convert the speclib file to parquet? --lib with speclib file and --out-lib with parquet filename would do the job?
  2. How can I convert back from parquet to speclib? Is it even possible?
  3. I tried to supply multiple --lib parameters to see if the search can go with multiple libraries combined but it seems like it's only taking the first one only. So, multiple --lib parameters is only valid for merging the libraries? If so, it would be the same syntax as the first question?
  4. Unlike the actual search, it seems like linux version 1.9.2 doesn't produce log.txt file when only the in silico prediction was performed separately. (only generated speclib file in the end)

Thank you so much!

best, heejong

vdemichev commented 4 weeks ago

Hi heejong,

In general, please just run things in the GUI with the correct settings and then copy the commands printed at the top of the log on Linux, this is the best in terms of avoiding any accidental errors in configuring.

  1. Input library - .speclib, output -.parquet. I think the commands are correct, but please check what the GUI does.
  2. Automatically happens when loading, creating .skyline.speclib.
  3. Works will all .parquet or all .tsv libraries.
  4. Thanks for spotting this, we will add it back in the next versions.

Best, Vadim

heejongkim commented 4 weeks ago

Hi Vadim,

Gotcha. I will give it a shot shortly and get back to you. Hopefully this thread can be a useful resource for other people who want to do spectral library wrangling.

From your statement that speclib is ever evolving, does that mean 1.9.2 is incompatible with 1.9.1-generated speclib? At least, it would be great if there's a way to tell different versions of speclib via diann or compatibility matrix in README.

Thank you.

best, heejong

vdemichev commented 3 weeks ago

It's backward compatible, but if I were to share the format, I would need to basically share the specification with each new DIA-NN release. There seems no need in that (.parquet better for any editing or integration with other tools).

heejongkim commented 3 days ago

Hi Vadim,

I'd like to report one more thing that I noticed.

With 1.9.2, multiple libraries into single parquet conversion worked. for example, --lib 1.parquet --lib 2.parquet --lib 3.parquet --out-lib combined.parquet it generates combined.parquet but also it generates 1.parquet.skyline.speclib at the end

[0:02] Spectral library saved to combined.parquet [0:03] Saving the library to 1.parquet.skyline.speclib

Is that part of answer for #2 above? (automatically creating speclib from parquet) but then if that's the case, shouldn't it be combined.parquet.skyline.speclib rather than taking the first library's name?

Thanks!

best, heejong

vdemichev commented 3 days ago

Hi heejong,

There's no obvious way how to combine several different names into one in a non-clumsy fashion, so it uses the first :)

Best, Vadim