wenbostar / PDV

PDV: an integrative proteomics data viewer
GNU General Public License v3.0
44 stars 20 forks source link

PEAKS Studio result visualization #12

Open juank1892 opened 4 years ago

juank1892 commented 4 years ago

Hi PDV community,

I was trying to visualize some .pepXML results and .mzXML files exported from PEAKS Studio 10.5.

However, I am receiving an error (attached image) that is preventing me to do any visualization. My computer is german and the decimal delimiters ("," for Germany) is the only thing that comes to mind...

Any suggestions for how to fix this? I was very curious to explore the visualization/annotation of PDV for peptides carrying PTMs not present in unimod.

Sincerely, JC

grafik

wenbostar commented 4 years ago

We haven't tested the pepXML and mzXML from PEAKS Studio. It looks PEAKS Studio can export mzIdentML which is a more standard format. Could you give a try using mzIdentML instead of pepXML?

wenbostar commented 4 years ago

If you can share the pepXML and mzXML files with us, we can do a test on the data.

juank1892 commented 4 years ago

So I tried with the mzIdentML export, but the same issue :(

grafik

I can share the data of a QC sample, but how do you advise that I share the .pepXML, .mzid, and .mzXML files?

wenbostar commented 4 years ago

Could you share the files with us by dropbox or google drive?

juank1892 commented 4 years ago

Here is a link to google drive folder:

https://drive.google.com/drive/folders/15O2b6l85fsmCGHNu4Oei4xp-r2MGzpgc?usp=sharing

Not sure if this is useful to know, but the .mzXML is still in resolution mode display.

KaiLiCn commented 4 years ago

Thanks JC. I will take a look and test it today.

Kai

KaiLiCn commented 4 years ago

I just tested the files you shared with us. It looks like your mzXML file may have some problem.

PDV cannot get precursor information from mzXML file. I checked this mzXML file and found the precursor information of each scan looks weird. image In our previous test data, this item should have the scan number of the precursor like this: image

May I ask how did you convert the raw data to mzXML? Could you please revise it?

juank1892 commented 4 years ago

The raw MS data is imported into PEAKS. Then the software converts and exports it for further analysis (in my case Skyline).

However, the lack of charge information might be an instrumentation issues. This data was acquired with a Synapt G2-Si TOF (Waters) instrument. For whatever reason, the precursor charge annotation is VERY PROBLEMATIC and doesn't seem to be written on the file (I think) since whenever I try to convert it with msconvert I usually don't get a useful charge annotation. I can only get the charge recalculation using Proteowizards "turbocharger" "filter" for raw data that didn't use IMS.

The pepXML has the "assumed" charge annotated since PEAKS re-estimate the charge state using the MS1 traces.

However, to fill the "precursorScanNum" and "precursorIntensity" I am not entirely sure how to fill that or if it would be written in the raw file.

KaiLiCn commented 4 years ago

Could you please share the raw file with me?

wenbostar commented 4 years ago

PEAKS Studio can also export MGF. We could try MGF to see if it works

juank1892 commented 4 years ago

I just uploaded the raw data and the mgf file exported from PEAKS Studio (this format has the charge state written) in the shared folder

juank1892 commented 4 years ago

Btw, the raw file had IMS separation of fragment ions so it doesn't work nicely in msconvert (at least what I have tried). If you manage to get a useful .mzXML or .mgf from a conversion of the raw file I would be very curious to learn how

KaiLiCn commented 4 years ago

Hi JC,

I did a set of test based on your shared data and got follow conclusions:

  1. I can't either convert your raw file to .mgf or mzXML well by MSConvert ;
  2. A good news is that I modified PDV and now it can support your mzIdentML and mgf file;
  3. However, there are several problems here: the first is that based on your mzIdentML file, six raw files used in your search. image As you see, here is a "id" to represent one spectrum file. image And in each PSM record, there is also a "spectraData_ref" mapping to a exact spectrum file. So PDV knows where this PSM is from. In general, when users seach against multiple .mgf files, for example in MS-GF+, they could input mzIdenML file and corresponding .mgf files into PDV and PDV would display annotated spectrum smoothly. Since the spectrum name in your mzIdentML ending with .raw, but spectrum file input into PDV ending with .mgf, PDV output error message to avoid mis-match in default. But now I change it to a question box to meet your situation as follow. image Then the second problem is because of the mis-match, the modified PDV only support one .mgf file as input. However, there is very simple way to figure out this problem based on current PDV-1.7.0. At first, you can export all raw files as .mgf format by PEAKS (these .mgf files work well in PDV); second, you can locate spectrum file position information in mzIdentML and change the extension to .mgf which means they will have exact same names with your input mgf files; finally, running PDV. You could also send all mgf files to me and I will do a test for you.

Please let me know if you have any questions.

Kai

juank1892 commented 4 years ago

Kai! Thanks a lot! You are awesome for all the time invested into this!

Indeed, it was a database search of 6 files. Sorry for not pointing out this and I apologize if it made you waste time unnecessarily.

So I went ahead and tested your suggestions with the names that should be allocated and realized that PDV can fully support the .mzid and .mgf exports from PEAKS! HOWEVER, you have to pay attention from where the exports are performed since the .mzid file names are written differently:

File names are written with .mgf suffix if exported "For Scaffold" option:

grafik

grafik

File names are written with original file suffix (i.e. "raw" for this Waters data) if exported from the "generic" exports section (this is the one I sent you last time....oops):

grafik

grafik

I hope that this clarification is useful for you or future users.

One comment on the results is that although the peptide annottation seems to be working well, the retention time and precursor m/z assignment is off. All RT and "m/z" rows are labelled as "-0.0".

grafik

The RT in the .mgf files is written in seconds. Could it be an issue of units conversion? However, for the m/z I have no idea.

I have uploaded the Scaffold exported .mzid file and .mgf files in case you would like to look further into it.

Last questions, with respect to number of PTMS and reporter ions since this is the actual application I would like to test for the app. I routinely look for 30+ modifications that have modification specific modifications with PEAKS. Are there any limits with the number of modifications that can be considered? Can reporter ions or specific neutral losses be considered as ions to be mapped?

Thanks for your time and help!

KaiLiCn commented 4 years ago

Thanks for your clarification.

I didn't see any new files in shared folder. But based on your previous data, I could get this information. image PDV extracts RT and m/z from mgf files. Could you please check your new mgf files?

There is not limits with the number of modifications in PDV. For example, PDV accepts Open-pFind open-search results which may include hundreds of modifications. Users can change colors of each modification by clicking color setting button in the top panel. image

Reporter ions are also considered in PDV. image image

For neutral losses, PDV detects if it's possible to have neutral losses in one PSM, users can click loss button in tools panel. For example, in phosphorylation: image image

For the result of PEAKS, I didn't test it systematically before, including modification formats. But it looks good based on your test. Please let me know if you have any other issues. And we will public a version in the future.