vdemichev / DiaNN

DIA-NN - a universal automated software suite for DIA proteomics data analysis.
Other
266 stars 53 forks source link

Native DiaNN support to linux #633

Open rolivella opened 1 year ago

rolivella commented 1 year ago

Hello

We plan to use DiaNN to develop a pipeline for processing several hundreds of Thermo RAW files in a regular manner. For this, we think that it's a good practice to use our linux HPC. However, we think that there's a bottle-neck in the sense that for linux is needed to convert RAW files to mzML. This means a lot of CPU time and extra storage. We'll like to suggest DiaNN to full support linux comman-line with native Thermo libraries to improve the scalability of this great tool.

Thank you!

vdemichev commented 1 year ago

Should be quick enough on most desktops too, even lib-free. On Linux you can run the diann.exe under Wine, and can also install MSFileReader under Wine, so then no need for conversion. Implementing native Linux support for Thermo is not straightforward unfortunately.

Best, Vadim

rolivella commented 1 year ago

I'm not sure if it's a good practice to leave a search for days in a Desktop PC having a linux HPC cluster available for this kind of computations. Maybe we can speed up the search with predicted libraries but anyway, I undestrand your point.

Do you know if we can use "mono" instead of "wine"? Is there any DiaNN user that has impleneted this kind of configuration?

Best! Roger

rolivella commented 1 year ago

mono -> https://www.mono-project.com/

kevinkovalchik commented 1 year ago

Hello Roger,

Mono won't help DIA-NN directly interface with RAW files in Linux. DIA-NN relies on Thermo's MSFileReader library which is Windows-only through and through and isn't a .NET application.

However, RAW to mzML file conversion is easily possible on Linux and quite fast. ThermoRawFileParser (https://github.com/compomics/ThermoRawFileParser) does it natively if you have mono installed.

Best, Kevin

rolivella commented 1 year ago

Thank you @vdemichev and @kevinkovalchik ! I'll give wine a try. But in my opinion for very large datasets with hundreds of files, converting to mzML it's not a desirable option at all because it needs a lot of CPU and storage resources. In fact you're duplicating the dataset while generating the mzML files (which are not small, by the way). And after the conversion, it's risky to clean them up beacuse in case you need to reprocess again the whole dataset with another parmaterization, converting again all the files is quite a big waste of time and resources.

Best,

Roger

onurserce commented 4 months ago

Edited on 13.05.2024

Opened a new issue #1013