kevinkovalchik / RawTools

RawTools is an open-source and freely available package designed to perform scan data parsing and quantification, and quality control analysis of Thermo Orbitrap raw mass spectrometer files from data-dependent acquisition experiments.
Apache License 2.0
64 stars 19 forks source link

'parse' of very short acquisition files yields Unhandled Exception #2

Closed colemathis closed 5 years ago

colemathis commented 6 years ago

I'm trying to use RawTools to parse a .raw file from a Thermo Orbitrap output on Windows 10.

The example files provided work correctly but my own file throws an error from the following

C:\RawTools-1.2.0>RawTools.exe parse -f "C:\Program Files\RawTools-1.2.0\DBD169_LEK_2.raw" -x

The error message is:

Unhandled Exception: System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at RawTools.Data.Extraction.MethodData.ExtractMethodData(RawDataCollection rawData, IRawDataPlus rawFile) in C:\Users\Kevin\Documents\GSC\Projects\RawTools\RawTools\Data\Extraction.cs:line 484
   at RawTools.Program.DoStuff(ParseOptions opts) in C:\Users\Kevin\Documents\GSC\Projects\RawTools\RawTools\Program.cs:line 273
   at CommandLine.ParserResultExtensions.WithParsed[T](ParserResult`1 result, Action`1 action)
   at RawTools.Program.Main(String[] args) in C:\Users\Kevin\Documents\GSC\Projects\RawTools\RawTools\Program.cs:line 62

The file is attached for reference DBD169_LEK_2.zip

kevinkovalchik commented 6 years ago

Thanks for attaching the file. I'll look into it this morning.

I have a couple questions to help in debugging:

  1. Was the acquisition DDA?
  2. If so, is it a standard Ms2 or Ms3 experiment or something more exotic like boxcar?

Kevin

On Mon, Oct 1, 2018, 6:19 AM Cole Mathis notifications@github.com wrote:

I'm trying to use RawTools to parse a .raw file from a Thermo Orbitrap output on Windows 10.

The example files provided work correctly but my own file throws an error from the following

C:\RawTools-1.2.0>RawTools.exe parse -f "C:\Program Files\RawTools-1.2.0\DBD169_LEK_2.raw" -x

The error message is:

Unhandled Exception: System.IndexOutOfRangeException: Index was outside the bounds of the array. at RawTools.Data.Extraction.MethodData.ExtractMethodData(RawDataCollection rawData, IRawDataPlus rawFile) in C:\Users\Kevin\Documents\GSC\Projects\RawTools\RawTools\Data\Extraction.cs:line 484 at RawTools.Program.DoStuff(ParseOptions opts) in C:\Users\Kevin\Documents\GSC\Projects\RawTools\RawTools\Program.cs:line 273 at CommandLine.ParserResultExtensions.WithParsed[T](ParserResult1 result, Action1 action) at RawTools.Program.Main(String[] args) in C:\Users\Kevin\Documents\GSC\Projects\RawTools\RawTools\Program.cs:line 62

The file is attached for reference DBD169_LEK_2.zip https://github.com/kevinkovalchik/RawTools/files/2434070/DBD169_LEK_2.zip

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kevinkovalchik/RawTools/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/APT50MvyYIw0tAK6wsD4Q081ZuXowz6gks5ughZsgaJpZM4XCDMe .

kevinkovalchik commented 6 years ago

The acquisition is really short. Was this a direct injection or something?

The small number of scans is what is causing the problem. It is tripping things up when calculating percentiles, but I can address this.

chrishuges commented 6 years ago

It appears to be a run off of a Nanomate source, so it is not surprising that it is so short I guess. The run is DDA though. The method parameters seem pretty standard.

The majority of the MS1 scans are empty, not sure if that is adding to the issue.

kevinkovalchik commented 6 years ago

Yeah, because there is just the one peak in there and it is only about three MS1 scans wide in the TIC we are ending up with lots of empty lists and arrays, mostly from the precursor peaks which RawTools expects to be at least three scans in width.

We can address this by adding an option to turn off the precursor peak calculations. Or we can add functionality to fit the precursor peaks with a function (exponentially modified Gaussian or something) rather than interpolating, which would be desirable anyway.

I'll start with the first option since it will be quickest to get working with these files.

kevinkovalchik commented 6 years ago

Okay, here's a really quick fix. The main problem I ran into is there wasn't enough scans to calculate the precursor peak shapes, so I when that happens it now just doesn't do it and reports a zero for peak width and peak asymmetry. All the other metrics seem to be working fine (expect those depending on peak width and asymmetry, of course). I haven't tested QC with it yet. The compiled program with this fix is in the following zip file.

RawTools_20181001.zip

I'll work on better solution and keep you posted.

JRKrieger37 commented 6 years ago

Hi - Love the tool. I seem to have an issue with some files where I get these messages.

Processing /Users/jrkrieger/Dropbox/Transfer/Mital Raw Data for QC Report/E/Mital-GroupErpt_F11.raw Indexing linked scan events: 100%
Extracting raw data: 76%
Unhandled Exception:
System.Collections.Generic.KeyNotFoundException: The given key '9139' was not present in the dictionary. at RawTools.Data.Extraction.AllData.ExtractAll (RawTools.Data.Collections.RawDataCollection rawData, ThermoFisher.CommonCore.Data.Interfaces.IRawDataPlus rawFile) [0x00113] in :0 at RawTools.QC.QC.DoQc (RawTools.Data.Containers.QcParameters qcParameters) [0x002a7] in :0 at RawTools.Program.DoStuff (RawTools.ArgumentParser.QcOptions opts) [0x0025d] in :0 at RawTools.Program+<>c.

b__0_1 (RawTools.ArgumentParser.QcOptions opts) [0x00000] in :0 at CommandLine.ParserResultExtensions.WithParsed[T] (CommandLine.ParserResult1[T] result, System.Action1[T] action) [0x0001e] in :0 at RawTools.Program.Main (System.String[] args) [0x000e1] in :0 [ERROR] FATAL UNHANDLED EXCEPTION: System.Collections.Generic.KeyNotFoundException: The given key '9139' was not present in the dictionary. at RawTools.Data.Extraction.AllData.ExtractAll (RawTools.Data.Collections.RawDataCollection rawData, ThermoFisher.CommonCore.Data.Interfaces.IRawDataPlus rawFile) [0x00113] in :0 at RawTools.QC.QC.DoQc (RawTools.Data.Containers.QcParameters qcParameters) [0x002a7] in :0 at RawTools.Program.DoStuff (RawTools.ArgumentParser.QcOptions opts) [0x0025d] in :0 at RawTools.Program+<>c.
b__0_1 (RawTools.ArgumentParser.QcOptions opts) [0x00000] in :0 at CommandLine.ParserResultExtensions.WithParsed[T] (CommandLine.ParserResult1[T] result, System.Action1[T] action) [0x0001e] in :0 at RawTools.Program.Main (System.String[] args) [0x000e1] in :0

It is part of a dataset with >600 samples and all the other ones seem to work fine. They are fractions of a larger study. (link to .raw file: https://www.dropbox.com/s/3adzonyf2cyc711/Mital-GroupErpt_F11.raw?dl=0). Any clues would be great. I've also tried the 20181001.zip build attached above for fun but that didn't work either!

kevinkovalchik commented 6 years ago

Hello,

Thanks for your interest! And thanks for the link to the raw file. Could you also send me the command line parameters you used? And which version of RawTools are you using? 1.3.1 was released today, though I'm not sure any changes in it would address the issue.

Kevin

On Oct 26, 2018 7:15 PM, "JRKrieger37" notifications@github.com wrote:

Hi - Love the tool. I seem to have an issue with some files where I get these messages.

Processing /Users/jrkrieger/Dropbox/Transfer/Mital Raw Data for QC Report/E/Mital-GroupErpt_F11.raw Indexing linked scan events: 100% Extracting raw data: 76% Unhandled Exception: System.Collections.Generic.KeyNotFoundException: The given key '9139' was not present in the dictionary. at RawTools.Data.Extraction.AllData.ExtractAll (RawTools.Data.Collections.RawDataCollection rawData, ThermoFisher.CommonCore.Data.Interfaces.IRawDataPlus rawFile) [0x00113] in :0 at RawTools.QC.QC.DoQc (RawTools.Data.Containers.QcParameters qcParameters) [0x002a7] in :0 at RawTools.Program.DoStuff (RawTools.ArgumentParser.QcOptions opts) [0x0025d] in :0 at RawTools.Program+<>c. b__0_1 (RawTools.ArgumentParser.QcOptions opts) [0x00000] in :0 at CommandLine.ParserResultExtensions.WithParsed[T] (CommandLine.ParserResult1[T] result, System.Action1[T] action) [0x0001e] in :0 at RawTools.Program.Main (System.String[] args) [0x000e1] in :0 [ERROR] FATAL UNHANDLED EXCEPTION: System.Collections.Generic.KeyNotFoundException: The given key '9139' was not present in the dictionary. at RawTools.Data.Extraction.AllData.ExtractAll (RawTools.Data.Collections.RawDataCollection rawData, ThermoFisher.CommonCore.Data.Interfaces.IRawDataPlus rawFile) [0x00113] in :0 at RawTools.QC.QC.DoQc (RawTools.Data.Containers.QcParameters qcParameters) [0x002a7] in :0 at RawTools.Program.DoStuff (RawTools.ArgumentParser.QcOptions opts) [0x0025d] in :0 at RawTools.Program+<>c.b__0_1 (RawTools.ArgumentParser.QcOptions opts) [0x00000] in :0 at CommandLine.ParserResultExtensions.WithParsed[T] (CommandLine.ParserResult1[T] result, System.Action1[T] action) [0x0001e] in :0 at RawTools.Program.Main (System.String[] args) [0x000e1] in :0

It is part of a dataset with >600 samples and all the other ones seem to work fine. They are fractions of a larger study. (link to .raw file: https://www.dropbox.com/s/3adzonyf2cyc711/Mital-GroupErpt_F11.raw?dl=0). Any clues would be great. I've also tried the 20181001.zip build attached above for fun but that didn't work either!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinkovalchik/RawTools/issues/2#issuecomment-433583113, or mute the thread https://github.com/notifications/unsubscribe-auth/APT50PIYE4yoBN1FiGrwUwanaFwHU7V3ks5uo8G6gaJpZM4XCDMe .

JRKrieger37 commented 6 years ago

Hi Kevin,

Sure. I am using version 1.3

My parameters were pretty simple - as follows:

Jonathans-MacBook-Pro:RawTools-1.3 jrkrieger$ mono RawTools.exe qc -d ~/Dropbox/Transfer/Mital\ Raw\ Data\ for\ QC\ Report/E -q ~/Dropbox/Tranasfer/GroupE_QC

For reference, in the same directory is this file: https://www.dropbox.com/s/s13817zxzvae6ko/Mital-GroupErpt_F1.raw?dl=0 https://www.dropbox.com/s/s13817zxzvae6ko/Mital-GroupErpt_F1.raw?dl=0

Which generates the QC data with no issues…There is not much in this file as it is the first of 60 offline high pH fractions run, but there seems to be something inherently different in the raw files. I ran 8 other groups fo 60 fractions with no issues whatsoever.

Thanks again!

Jonathan

On Oct 26, 2018, at 10:56 PM, kevinkovalchik notifications@github.com wrote:

Hello,

Thanks for your interest! And thanks for the link to the raw file. Could you also send me the command line parameters you used? And which version of RawTools are you using? 1.3.1 was released today, though I'm not sure any changes in it would address the issue.

Kevin

On Oct 26, 2018 7:15 PM, "JRKrieger37" notifications@github.com wrote:

Hi - Love the tool. I seem to have an issue with some files where I get these messages.

Processing /Users/jrkrieger/Dropbox/Transfer/Mital Raw Data for QC Report/E/Mital-GroupErpt_F11.raw Indexing linked scan events: 100% Extracting raw data: 76% Unhandled Exception: System.Collections.Generic.KeyNotFoundException: The given key '9139' was not present in the dictionary. at RawTools.Data.Extraction.AllData.ExtractAll (RawTools.Data.Collections.RawDataCollection rawData, ThermoFisher.CommonCore.Data.Interfaces.IRawDataPlus rawFile) [0x00113] in :0 at RawTools.QC.QC.DoQc (RawTools.Data.Containers.QcParameters qcParameters) [0x002a7] in :0 at RawTools.Program.DoStuff (RawTools.ArgumentParser.QcOptions opts) [0x0025d] in :0 at RawTools.Program+<>c. b__0_1 (RawTools.ArgumentParser.QcOptions opts) [0x00000] in :0 at CommandLine.ParserResultExtensions.WithParsed[T] (CommandLine.ParserResult1[T] result, System.Action1[T] action) [0x0001e] in :0 at RawTools.Program.Main (System.String[] args) [0x000e1] in :0 [ERROR] FATAL UNHANDLED EXCEPTION: System.Collections.Generic.KeyNotFoundException: The given key '9139' was not present in the dictionary. at RawTools.Data.Extraction.AllData.ExtractAll (RawTools.Data.Collections.RawDataCollection rawData, ThermoFisher.CommonCore.Data.Interfaces.IRawDataPlus rawFile) [0x00113] in :0 at RawTools.QC.QC.DoQc (RawTools.Data.Containers.QcParameters qcParameters) [0x002a7] in :0 at RawTools.Program.DoStuff (RawTools.ArgumentParser.QcOptions opts) [0x0025d] in :0 at RawTools.Program+<>c.b__0_1 (RawTools.ArgumentParser.QcOptions opts) [0x00000] in :0 at CommandLine.ParserResultExtensions.WithParsed[T] (CommandLine.ParserResult1[T] result, System.Action1[T] action) [0x0001e] in :0 at RawTools.Program.Main (System.String[] args) [0x000e1] in :0

It is part of a dataset with >600 samples and all the other ones seem to work fine. They are fractions of a larger study. (link to .raw file: https://www.dropbox.com/s/3adzonyf2cyc711/Mital-GroupErpt_F11.raw?dl=0). Any clues would be great. I've also tried the 20181001.zip build attached above for fun but that didn't work either!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinkovalchik/RawTools/issues/2#issuecomment-433583113, or mute the thread https://github.com/notifications/unsubscribe-auth/APT50PIYE4yoBN1FiGrwUwanaFwHU7V3ks5uo8G6gaJpZM4XCDMe . — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinkovalchik/RawTools/issues/2#issuecomment-433585376, or mute the thread https://github.com/notifications/unsubscribe-auth/Aqc_4_VgpPuaOghsQO8h8LWkfzQ2SVbfks5uo8tigaJpZM4XCDMe.

kevinkovalchik commented 6 years ago

I looked at these files this morning. The issue is that there is a single MS3 scan in the raw file which is not indexed as a dependent of another scan, and when the programs gets up to extracting the data for that scan it fails because according to the linked scan index, it doesn't exist. The scan is question is 9139.

It looks like there is something wrong with this scan. I looked at the trailer for the scan and it has very little data in it. No master scan index, no SPS ions, no HCD energy, no isolation width, etc. I searched the MS2 scans and there was one that had no scan dependent, and that was scan 9137, so it is very likely that 9139 is the MS3 scan for 9137.

I don't know exactly what is up with the scan, but it seems like it there was likely some sort of error during the acquisition that resulted in the meta data being lost.

So as far as how to fix the problem so you can process the rest of the file, the lists used for scan iteration and the linked scan index are created separately, so this issue could be fixed by building those scan lists from the linked scan index instead of the raw scan numbers. That way any similarly corrupted scans get skipped.

I'll go ahead and work on that fix. However, the output table won't have any quant or meta data for the scans mentioned above, so you might want to do a search of scan 9137 to see if it is something relevant to your study.

kevinkovalchik commented 6 years ago

You can try this build:

RawTools_20181029.zip

It includes the change I mentioned above and processes both those files.

Hope it helps!

Kevin

kevinkovalchik commented 6 years ago

@colemathis I didn't really come up with a better solution for looking at your data from the Nanomate source. The MS1 scans are just too far apart to calculate any metrics about peaks shapes, etc. I understand this is probably because the runs are so short you want to spend as little time on MS1 as possible, so peak shapes might not be as important. Is there any other aspect of the data that we aren't covering that would be useful to you?

JRKrieger37 commented 6 years ago

Thanks Kevin.

This is interesting. I’ve seen this issue with MS3 raw files previously. I appreciate you looking into it and offering the workaround!

JK

On Oct 29, 2018, at 5:10 PM, kevinkovalchik notifications@github.com wrote:

I looked at these files this morning. The issue is that there is a single MS3 scan in the raw file which is not indexed as a dependent of another scan, and when the programs gets up to extracting the data for that scan it fails because according to the linked scan index, it doesn't exist. The scan is question is 9139.

It looks like there is something wrong with this scan. I looked at the trailer for the scan and it has very little data in it. No master scan index, no SPS ions, no HCD energy, no isolation width, etc. I searched the MS2 scans and there was one that had no scan dependent, and that was scan 9137, so it is very likely that 9139 is the MS3 scan for 9137.

I don't know exactly what is up with the scan, but it seems like it there was likely some sort of error during the acquisition that resulted in the meta data being lost.

So as far as how to fix the problem so you can process the rest of the file, the lists used for scan iteration and the linked scan index are created separately, so this issue could be fixed by building those scan lists from the linked scan index instead of the raw scan numbers. That way any similarly corrupted scans get skipped.

I'll go ahead and work on that fix. However, the output table won't have any quant or meta data for the scans mentioned above, so you might want to do a search of scan 9137 to see if it is something relevant to your study.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kevinkovalchik/RawTools/issues/2#issuecomment-434081362, or mute the thread https://github.com/notifications/unsubscribe-auth/Aqc_4-K9fXnFG_K9IsefB9ShO0mMR3ltks5up26wgaJpZM4XCDMe.

kevinkovalchik commented 5 years ago

No activity here for a month, so I'm going close it. Let me know if there is still an issue!