Some raw data could not be processed

chenliangyu18 commented 2 years ago

Dear Vadim, DIANN is really helpful for our proteomic research.

We recently met one problem. For example, in one batch of over 300 continuous sequenced data, 12 of them could not be processed in our best server. Then, we tried them on another server, they could be processed.

This led some inconvenience. We have tried to figure out the issue, but it seems random.

If you have any idea about this problem, please help us!

Many thanks!

The log for the problematic data is attached below.

diann.exe --f "H:\jiangbei_R032_lung\2\ptr2204b0353.raw " --lib "D:\DIANN_py_v2\test\lib.predicted.speclib" --threads 32 --verbose 1 --out "H:\jiangbei_R032_lung\2_out\353\report.tsv" --qvalue 0.01 --matrices --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal --report-lib-info DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Fri Sep 23 12:10:16 2022 CPU: AuthenticAMD AMD EPYC 7513 32-Core Processor SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 SSE4a Logical CPU cores: 64 Thread number set to 32 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

1 files will be processed [0:00] Loading spectral library D:\DIANN_py_v2\test\lib.predicted.speclib [0:04] Library annotated with sequence database(s): D:\DIANN_py_v2/uniprot_ProSD.fasta [0:04] Gene names missing for some isoforms [0:04] Library contains 20407 proteins, and 20104 genes [0:05] Spectral library loaded: 20407 protein isoforms, 29350 protein groups and 4290388 precursors in 1335784 elution groups. [0:05] Initialising library

[0:08] File #1/1 [0:08] Loading run H:\jiangbei_R032_lung\2\ptr2204b0353.raw

DIA-NN exited DIA-NN-plotter.exe "H:\jiangbei_R032_lung\2_out\353\report.stats.tsv" "H:\jiangbei_R032_lung\2_out\353\report.tsv" "H:\jiangbei_R032_lung\2_out\353\report.pdf" PDF report will be generated in the background

diann.exe --f "H:\jiangbei_R032_lung\2\ptr2204b0353.raw " --lib "D:\DIANN_py_v2\test\lib.predicted.speclib" --threads 32 --verbose 1 --out "H:\jiangbei_R032_lung\2_out\353\report.tsv" --qvalue 0.01 --matrices --relaxed-prot-inf --smart-profiling --peak-center --no-ifs-removal --report-lib-info DIA-NN 1.8.1 (Data-Independent Acquisition by Neural Networks) Compiled on Apr 14 2022 15:31:19 Current date and time: Fri Sep 23 12:37:48 2022 CPU: AuthenticAMD AMD EPYC 7513 32-Core Processor SIMD instructions: AVX AVX2 FMA SSE4.1 SSE4.2 SSE4a Logical CPU cores: 64 Thread number set to 32 Output will be filtered at 0.01 FDR Precursor/protein x samples expression level matrices will be saved along with the main report Highly heuristic protein grouping will be used, to reduce the number of protein groups obtained; this mode is recommended for benchmarking protein ID numbers; use with caution for anything else When generating a spectral library, in silico predicted spectra will be retained if deemed more reliable than experimental ones Fixed-width center of each elution peak will be used for quantification Interference removal from fragment elution curves disabled DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition scheme.

1 files will be processed [0:00] Loading spectral library D:\DIANN_py_v2\test\lib.predicted.speclib [0:04] Library annotated with sequence database(s): D:\DIANN_py_v2/uniprot_ProSD.fasta [0:04] Gene names missing for some isoforms [0:04] Library contains 20407 proteins, and 20104 genes [0:05] Spectral library loaded: 20407 protein isoforms, 29350 protein groups and 4290388 precursors in 1335784 elution groups. [0:05] Initialising library

[0:09] File #1/1 [0:09] Loading run H:\jiangbei_R032_lung\2\ptr2204b0353.raw

DIA-NN exited DIA-NN-plotter.exe "H:\jiangbei_R032_lung\2_out\353\report.stats.tsv" "H:\jiangbei_R032_lung\2_out\353\report.tsv" "H:\jiangbei_R032_lung\2_out\353\report.pdf" PDF report will be generated in the background

vdemichev commented 2 years ago

As I understand, some .raw files get processed correctly, while some don't, on the same PC? I would try to run processing of those from a command line, to see what error it prints.

chenliangyu18 commented 2 years ago

Dear Vadim, Yes, it happened on the same PC. But when we run the failed files on another PC ,they worked!

On the other hand, we found another problem which might be related. Some files could not be processed on every PC (4 PC). On one of the PCs with win7, it showed error report as follows. We suppose it might be related to the "stack overflow". Does it mean, some signials are too high for the software?

Problem signature: Problem Event Name: APPCRASH Application Name: diann.exe Application Version: 0.0.0.0 Application Timestamp: 62582239 Fault Module Name: StackHash_c9bd Fault Module Version: 6.1.7601.24384 Fault Module Timestamp: 5c6e245d Exception Code: c0000374 Exception Offset: 00000000000bf302 OS Version: 6.1.7601.2.1.0.256.48 Locale ID: 2052 Additional Information 1: c9bd Additional Information 2: c9bd36cbb8551b0a73575d983c5446cf Additional Information 3: f1d1 Additional Information 4: f1d1c3d242c9d01a28297f40b70aa465

Read our privacy statement online: http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline: C:\Windows\system32\en-US\erofflps.txt

chenliangyu18 commented 2 years ago

We checked the event log in win10 system and found the error log as follow.

Faulting application name: diann.exe, version: 0.0.0.0, time stamp: 0x62582239 Faulting module name: fileio_x64.dll, version: 3.1.0.0, time stamp: 0x563818b6 Exception code: 0xc0000005 Fault offset: 0x00000000000af646 Faulting process id: 0x2c48 Faulting application start time: 0x01d8e5195775f6ce Faulting application path: E:\DIA-NN\1.8.1\diann.exe Faulting module path: C:\Program Files\Thermo\MSFileReader\fileio_x64.dll Report Id: c36e3975-e8e7-4826-ae1b-84411fbc72a2 Faulting package full name: Faulting package-relative application ID:

chenliangyu18 commented 2 years ago

We offer 2 raw file for your trial in the follow link. 011 is ok and 035 is not.

https://www.zconnect.cn/AppH5/share2/?nid=LIYDIMJQGEYDESSUGRLTQ&code=QTtneYMCcjQoLiCvqvYVDD7lgxWm2TYr7UCtIm20JxbkiIB4tXm32XBunQ464sNaSdO&mode=file&display=list code：6041

chenliangyu18 commented 2 years ago

We also try to upload the two files on OneDrive.

https://1drv.ms/u/s!Auwap57Rj6wShQ8pfau73EvyLigi?e=6je6Ke

https://1drv.ms/u/s!Auwap57Rj6wShQ6mCfsXXGbHYUvT?e=E4PZZ7

passcode:pt2022

vdemichev / DiaNN

Some raw data could not be processed #509