Support for light-heavy searches

stengellab commented 4 years ago

Hello there.

We'd like to use the xQuest importer for our searches. However, we are using a light-heavy workflow for our samples. Unfortunately the importer only supports light-light searches at the moment. Are there any plans for light-heavy support?

If not would anyone be able to point us in the right direction on how to implement light-heavy support? We have someone familiar with Java.

Thank you very much!

mriffle commented 4 years ago

Hi, thanks for the message!

The proxl XML format does have support for stable isotopes on identified peptides and we do support light-heavy type searches for other pipelines. So, I do believe the xQuest converter can be updated to support this as well.

If your Java programmer could investigate adding this to the xQuest converter, that'd be great. I am going to make a minor update to this repository today or tomorrow that will make the repository easier for your programmer to work with, and I will post an update on this issue when it's done on how to proceed. Ideally we would be able to bring those updates into the official xQuest converter here.

Cheers! Mike

mriffle commented 4 years ago

OK, I have updated this repo to make it a little easier to work with. It can be built using gradle with: gradlew shadowJar, which will place the executable jar file in build\libs I have also simplified the command line interface a lot.

The current schema for proxl XML may be found at: https://github.com/yeastrc/proxl-import-api/tree/master/xsd The relevant sections for isotope labels are the peptide_isotope_labels element on the peptide element and the protein_isotope_labels on the protein element. Essentially, an identified peptide needs to be labeled as 13C, 15N, 18O, or 2H and there needs to be a correspondingly labeled protein in the matched_proteins section to which to match the peptide. Proxl will automatically do the mass conversions in the appropriate places in the website.

Please contact me with any questions, suggestions or any other comments. If the person working on this runs up against a brick wall, just let me know. I may be able to do a lot of the heavy lifting if you send me sample data.

mriffle commented 4 years ago

Would also like to add that it would be great if we could incorporate any changes made to support light/heavy searches back into the official repository here so others could take advantage of it. To facilitate this, if you could fork this project and submit your final changes as a pull request, that'd be great.

kaisengit commented 4 years ago

Hey Mike,

this is Kai, the Java guy. Thank you very much for the quick response. We are positively surprised to get feedback this quick!

Unfortunately our initial request was a bit ambiguous: we are not talking about peptide labeling but rather a labelled crosslinker. We use DSS in a heavy (+12 m/z) and light form. xQuest uses this mass shift to improve crosslink detection (and later on in the pipeline this also helps with crosslink quantitation via xTract).

The importer only supports the light linker/light searches. This is also referred to here in the source: https://github.com/yeastrc/proxl-import-xquest/blob/13ee2eb129d961caf0f863270012f565499fb886/src/main/java/org/yeastrc/proxl/xml/xquest/main/ProcessXQuestMainFile.java#L230 https://github.com/yeastrc/proxl-import-xquest/blob/13ee2eb129d961caf0f863270012f565499fb886/src/main/java/org/yeastrc/proxl/xml/xquest/constants/ScanTypeAllowedConstants.java#L7

Our searches are all light_heavy and therefore not supported. I hope I have made our issue a bit clearer. Again, I would be grateful for any insights on how to approach the implementation for light_heavy support. And a little warning: while I am familiar with Java and have coded a few projects with it, the last time I used it has been years ago. So it's a bit rusty.

Cheers! Kai

mriffle commented 3 years ago

Hi Kai,

Aha, that makes sense. Proxl does support having multiple linkers in a single run--so the converter could be updated to support a heavy linker and a light linker that each have different linker masses. We'd list the heavy and light versions of the linker as separate linkers in the linkers section and use the mass for the respective linker (light or heavy) as the linker mass on each PSM. So no issue there (famous last words).

I think the more challenging issue is how to present those data in a way that make sense for that kind of experiment in proxl. Although we allow multiple cross-linkers in an experiment, I don't think we currently display which cross-linker was identified for a particular PSM unless you view the spectrum. Essentially, proxl just views a cross-linked pair of positions as a cross-linked pair of positions and doesn't care which linker in the experiment identified it unless you go to view the spectrum. I'm guessing this is non-optimal for your data.

If you are willing to share data, I think support for this kind of experiment would be a great thing to add to proxl. I'm a bit swamped at the moment adding another feature to proxl, but I would love to take this on soon. In the meantime, is there a way of interacting with your data that makes the most sense? In a perfect world, what would you like to see in the web interface? Perhaps a way to break things down by which linker in the experiment identified the cross-link?

kaisengit commented 3 years ago

Hey Mike,

that Proxl already supports multiple crosslinkers sounds very encouraging indeed!

I am honestly not sure about the best data representation approach. As I said the light-heavy approach mainly improves detection of crosslinked peptides. What we currently want to do is to use the Proxl export as input for Skyline to evaluate its recent crosslink support. But being able to view our data in the viewer would be a big plus for sure and your suggestion of breaking down by linker type already sounds good.

Let me sit down with the rest of the lab and talk about that; I will come back with some feedback. For the next two weeks I am on holiday, so it fits rather well that you're busy as well. We will definitely have some data sets to share as well. Again, thank you very much for your help!

kaisengit commented 3 years ago

Hello again,

as indicated in my last message there is not much additional information we'd actually need in the web interface. Our heavy/light crosslinker approach mostly just serves to improve crosslinked peptide detection. Though having the ability to see the crosslinker in the web interface could also benefit experiments where multiple crosslinkers are used at once.

The most important part for us would actually be having the xquest importer work properly with heavy-light searches in order to receive a xml file conforming with ProXL input. We would then continue with importing into Skyline as a next step.

Regarding a data set: what would you actually need? Just a xQuest results folder of a heavy-light search? Or also the raw ms files or the centroided mzXML files?

yeastrc / proxl-import-xquest

Support for light-heavy searches #1